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Preface 



Computational modelling is becoming the third paradigm of modern sciences, as 
predicted by the Nobel Prize winner Ken Wilson in 1980s at Cornell University. This 
so-called third paradigm complements theory and experiment to problem solving. In 
fact, a substantial amount of research activities in engineering, science and industry 
today involves mathematical modelling, data analysis, computer simulations, and 
optimization. The main variations of such activities among different disciplines are 
the type of problem of interest and the degree as well as extent of the modelling 
activities. This is especially true in the subjects ranging from engineering design to 
industry. 

Computational optimization is an important paradigm itself with a wide range 
of applications. In almost all applications in engineering and industry, we almost 
always try to optimize something - whether to minimize the cost and energy con- 
sumption, or to maximize the profit, output, performance and efficiency. In real- 
ity, resources, time and money are always limited; consequently, optimization is 
far more important. The optimal use of available resources of any sort requires a 
paradigm shift in scientific thinking, which is because most real-world applications 
have far more complicated factors and parameters as well as constraints to affect the 
system behaviour. Subsequently, it is not always possible to find the optimal solu- 
tions. In practice, we have to settle for suboptimal solutions or even feasible ones 
that are satisfactory, robust, and practically achievable in a reasonable time scale. 

This search for optimality is complicated further by the fact that uncertainty al- 
most always presents in the real-world systems. For example, materials properties 
always have a certain degree of inhomogeneity. The available materials which are 
not up to the standards of the design will affect the chosen design significantly. 
Therefore, we seek not only the optimal design but also robust design in engineer- 
ing and industry. Another complication to optimization is that most problems are 
nonlinear and often NP-hard. That is, the solution time for finding optimal solu- 
tions is exponential in terms of problem size. In fact, many engineering applications 
are NP-hard indeed. Thus, the challenge is to find a workable method to tackle the 
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problem and to search for optimal solutions, though such optimality is not always 
achievable. 

Contemporary engineering design is heavily based on computer simulations. This 
introduces additional difficulties to optimization. Growing demand for accuracy and 
ever-increasing complexity of structures and systems results in the simulation pro- 
cess being more and more time consuming. Even with an efficient optimization 
algorithm, the evaluations of the objective functions are often time-consuming. In 
many engineering fields, the evaluation of a single design can take as long as sev- 
eral hours up to several days or even weeks. On the other hand, simulation-based 
objective functions are inherently noisy, which makes the optimization process even 
more difficult. Still, simulation-driven design becomes a must for a growing number 
of areas, which creates a need for robust and efficient optimization methodologies 
that can yield satisfactory designs even at the presence of analytically intractable 
objectives and limited computational resources. 

In most engineering design and industrial applications, the objective cannot be 
expressed in explicit analytical form, as the dependence of the objective on de- 
sign variables is complex and implicit. This black-box type of optimization often 
requires a numerical, often computationally expensive, simulator such as computa- 
tional fluid dynamics and finite element analysis. Furthermore, almost all optimiza- 
tion algorithms are iterative, and require numerous function evaluations. Therefore, 
any technique that improves the efficiency of simulators or reduces the function 
evaluation count is crucially important. Surrogate-based and knowledge-based op- 
timization uses certain approximations to the objective so as to reduce the cost of 
objective evaluations. The approximations are often local, while the quality of ap- 
proximations is evolving as the iterations proceed. Applications of optimization in 
engineering and industry are diverse. The contents are quite representative and cover 
all major topics of computational optimization and modelling. 

This book is contributed from worldwide experts who are working in these excit- 
ing areas, and each chapter is practically self-contained. This book strives to review 
and discuss the latest developments concerning optimization and modelling with a 
focus on methods and algorithms of computational optimization, and also covers 
relevant applications in science, engineering and industry. 

We would like to thank our editors, Drs Thomas Ditzinger and Holger Schaepe, 
and staff at Springer for their help and professionalism. Last but not least, we thank 
our families for their help and support. 
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Chapter 1 

Computational Optimization: An Overview 

Xin-She Yang and Slawomir Koziel 



Abstract. Computational optimization is ubiquitous in many applications in engi- 
neering and industry. In this chapter, we briefly introduce computational optimiza- 
tion, the optimization algorithms commonly used in practice, and the choice of an 
algorithm for a given problem. We introduce and analyze the main components of a 
typical optimization process, and discuss the challenges we may have to overcome 
in order to obtain optimal solutions correctly and efficiently. We also highlight some 
of the state-of-the-art developments in optimization and its diverse applications. 



1.1 Introduction 

Optimization is everywhere, from airline scheduling to finance and from the Internet 
routing to engineering design. Optimization is an important paradigm itself with a 
wide range of applications. In almost all applications in engineering and industry, 
we are always trying to optimize something - whether to minimize the cost and 
energy consumption, or to maximize the profit, output, performance and efficiency. 
In reality, resources, time and money are always limited; consequently, optimization 
is far more important in practice lTJ|7l|27 ■ 29 1 . The optimal use of available resources 
of any sort requires a paradigm shift in scientific thinking, this is because most real- 
world applications have far more complicated factors and parameters to affect how 
the system behaves. The integrated components of such an optimization process are 
the computational modelling and search algorithms. 
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2 X.-S. Yang and S. Koziel 

Computational modelling is becoming the third paradigm of modern sciences, 
as predicted by the Nobel Prize winner Ken Wilson in 1980s at Cornell University. 
This so-called third paradigm complements theory and experiment to problem solv- 
ing. It is no exaggeration to say almost all research activities in engineering, science 
and industry today involve a certain amount of modelling, data analysis, computer 
simulations, and optimization. The main variations of such activities among differ- 
ent disciplines are the type of problem of interest and the degree and extent of the 
modelling activities. This is especially true in the subjects ranging from engineering 
design to oil industry and from climate changes to economics. 

Search algorithms are the tools and techniques of achieving optimality of the 
problem of interest. This search for optimality is complicated further by the fact 
that uncertainty almost always presents in the real-world systems. For example, ma- 
terials properties such as Young's modulus and strength always have a certain degree 
of inhomogeneous variations. The available materials which are not up to the stan- 
dards of the design will affect the chosen design significantly. Therefore, we seek 
not only the optimal design but also robust design in engineering and industry. Op- 
timal solutions, which are not robust enough, are not practical in reality. Suboptimal 
solutions or good robust solutions are often the choice in such cases. 

Contemporary engineering design is heavily based on computer simulations. This 
introduces additional difficulties to optimization. Growing demand for accuracy and 
ever-increasing complexity of structures and systems results in the simulation pro- 
cess being more and more time consuming. In many engineering fields, the evalua- 
tion of a single design can take as long as several days or even weeks. On the other 
hand, simulation-based objective functions are inherently noisy, which makes the 
optimization process even more difficult. Still, simulation-driven design becomes a 
must for a growing number of areas, which creates a need for robust and efficient 
optimization methodologies that can yield satisfactory designs even at the presence 
of analytically intractable objectives and limited computational resources. 

1.2 Computational Optimization 

Optimization problems can be formulated in many ways. For example, the com- 
monly used method of least-squares is a special case of maximum-likelihood for- 
mulations. By far the most widely formulation is to write a nonlinear optimization 
problem as 

minimize fi(x), (i=l,2,...,M), (1.1) 

subject to the constraints 

hj(x), (7 = 1,2,..., J), (1.2) 



8k(x)<0, (*=1,2,...,AT), (1.3) 

where fi,hj and g^ are in general nonlinear functions. Here the design vector 
x = {x\,X2,.-.,x n ) can be continuous, discrete or mixed in w-dimensional space. The 
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functions fi are called objective or cost functions, and when M > 1 , the optimization 
is multiobjective or multicriteria [21 1. It is possible to combine different objectives 
into a single objective, and we will focus on the single-objective optimization prob- 
lems in most part of this book. It is worth pointing out that here we write the prob- 
lem as a minimization problem, it can also be written as a maximization by simply 
replacing fi(x) by -fi(x). 

In a special case when K = 0, we have only equality constraints, and the opti- 
mization becomes an equality-constrained problem. As an equality h(x) — can be 
written as two inequalities: h(x) < and —h(x) < 0, some formulations in the opti- 
mization literature use constraints with inequalities only. However, in this book, we 
will explicitly write out equality constraints in most cases. 

When all functions are nonlinear, we are dealing with nonlinear constrained prob- 
lems. In some special cases when fi,hj,gk are linear, the problem becomes lin- 
ear, and we can use the widely linear programming techniques such as the simplex 
method. When some design variables can only take discrete values (often integers), 
while other variables are real continuous, the problem is of mixed type, which is 
often difficult to solve, especially for large-scale optimization problems. 

A very special class of optimization is the convex optimization [2], which has 
guaranteed global optimality. Any optimal solution is also the global optimum, and 
most importantly, there are efficient algorithms of polynomial time to solve such 
problems [3|. These efficient algorithms such the interior-point methods lfT2l are 
widely used and have been implemented in many software packages. 

On the other hand, some of the functions such as fi are integral, while others such 
as hi are differential equations, the problem becomes an optimal control problem, 
and special techniques are required to achieve optimality. 

For most applications in this book, we will mainly deal with nonlinear con- 
strained global optimization problems with a single objective. In one chapter by 
Coello Coello, multiobjective optimization will be discussed in detail. Optimal con- 
trol and other cases will briefly be discussed in the relevant context in this book. 



1.3 Optimization Procedure 

In essence, an optimization process consists of three components: model, optimizer 
and simulator (see Fig. l 1 . lb . 

The mathematical or numerical model is the representation of the physical prob- 
lem using mathematical equations which can be converted into a numerical model 
and can then be solved numerically. This is the first crucial step in any modelling 
and optimization. If there is any discrepancy between the intended mathematical 
model and the actual model in use, we may solve the wrong mathematical model or 
deal with a different or even wrong problem. Any mathematical model at this stage 
should be double-checked and validated. Once we are confident that the mathemat- 
ical model is indeed correct or right set of approximations in most cases, we can 
proceed to convert it into the right numerical model so that it can be solved numeri- 
cally and efficiently. Again it is important to ensure the right numerical schemes for 
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dicretization are used; otherwise, we may solve a different problem numerically. At 
this stage, we should not only ensure that numerical model is right, but also ensure 
that the model can be solved as fast as possible. 




Fig. 1.1 A typical optimization process 



Another important step is to use the right algorithm or optimizer so that an op- 
timal set of combination of design variables can be found. An important capability 
of optimization is to generate or search for new solutions from a known solution 
(often a random guess or a known solution from experience), which will lead to 
the convergence of the search process. The ultimate aim of this search process is 
to find solutions which converge at the global optimum, though this is usually very 
difficult. 

In term of computing time and cost, the most important step is the use of an effi- 
cient evaluator or simulator. In most applications, once a correct model representa- 
tion is made and implemented, an optimization process often involves the evaluation 
of objective function (such as the aerodynamical efficiency of an airfoil) many times, 
often thousands and even millions of configurations. Such evaluations often involve 
the use of extensive computational tools such as a computational fluid dynamics 
simulator or a finite element solver. This is the step that is most time-consuming, 
often taking 50% to 90% of the overall computing time. 



1.4 Optimizer 

1.4.1 Optimization Algorithms 

An efficient optimizer is very important to ensure the optimal solutions are reach- 
able. The essence of an optimizer is a search or optimization algorithm implemented 
correctly so as to carry out the desired search (though not necessarily efficiently). 
It can be integrated and linked with other modelling components. There are many 
optimization algorithms in the literature and no single algorithm is suitable for all 
problems, as dictated by the No Free Lunch Theorems l24l . 
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Optimization algorithms can be classified in many ways, depending on the 
focus or the characteristics we are trying to compare. Algorithms can be classified 
as gradient-based (or derivative-based methods) and gradient-free (or derivative-free 
methods). The classic method of steepest descent and Gauss-Newton methods are 
gradient-based, as they use the derivative information in the algorithm, while the 
Nelder-Mead downhill simplex method [18] is a derivative-free method because it 
only uses the values of the objective, not any derivatives. 

From a different point of view, algorithms can be classified as trajectory-based 
or population-based. A trajectory-based algorithm typically uses a single agent or 
solution point which will trace out a path as the iterations and optimization pro- 
cess continue. Hill-climbing is trajectory-based, and it links the starting point with 
the final point via a piecewise zigzag path. Another important example is the sim- 
ulated annealing [ 13 1 which is a widely used metaheuristic algorithm. On the other 
hand, population-based algorithms such as particle swarm optimization use multiple 
agents which will interact and trace out multiple paths [ 1 1 1. Another classic example 
is the genetic algorithms [8j [10) . 

Algorithms can also be classified as deterministic or stochastic. If an algorithm 
works in a mechanically deterministic manner without any random nature, it is 
called deterministic. For such an algorithm, it will reach the same final solution 
if we start with the same initial point. Hill-climbing and downhill simplex are good 
examples of deterministic algorithms. On the other hand, if there is some random- 
ness in the algorithm, the algorithm will usually reach a different point every time 
we run the algorithm, even though we start with the same initial point. Genetic al- 
gorithms and hill-climbing with a random restart are good examples of stochastic 
algorithms. 

Analyzing the stochastic algorithms in more detail, we can single out the type 
of randomness that a particular algorithm is employing. For example, the simplest 
and yet often very efficient method is to introduce a random starting point for a de- 
terministic algorithm. The well-known hill-climbing with random restart is a good 
example. This simple strategy is both efficient in most cases and easy to implement 
in practice. A more elaborate way to introduce randomness to an algorithm is to use 
randomness inside different components of an algorithm, and in this case, we of- 
ten call such algorithm heuristic or more often metaheuristic [23, 26 1. A very good 
example is the popular genetic algorithms which use randomness for crossover and 
mutation in terms of a crossover probability and a mutation rate. Here, heuristic 
means to search by trial and error, while metaheuristic is a higher level of heuristics. 
However, modern literature tends to refer all new stochastic algorithms as meta- 
heuristic. In this book, we will use metaheuristic to mean either. It is worth pointing 
out that metaheuristic algorithms form a hot research topics and new algorithms 
appear almost yearly [25, 28 1. 

Memory use can be important to some algorithms. Therefore, optimization algo- 
rithms can also be classified as memoryless or history-based. Most algorithms do not 
use memory explicitly, and only the current best or current state is recorded and all 
the search history may be discarded. In this sense, such algorithms can thus be con- 
sidered as memoryless. Genetic algorithms, particle swarm optimization and cuckoo 
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search all fit into this category. It is worth pointing out that we should not confuse 
the use of memory with the simple record of the current state and the elitism or se- 
lection of the fittest. On the other hand, some algorithms indeed use memory /history 
explicitly. In the Tabu search [9|, tabu lists are used to record the move history and 
recently visited solutions will not be tried again in the near future, and it encourages 
to explore completely different new solutions, which may save computing effort 
significantly. 

Another type of the algorithm is the so-called mixed-type or hybrid, which uses 
some combination of deterministic and randomness, or combines one algorithm 
with another so as to design more efficient algorithms. For example, genetic al- 
gorithms can be hybridized with many algorithms such as particle swarm optimiza- 
tion; more specifically, may involve the use of generic operators to modify some 
components of another algorithm. 

From the mobility point of view, algorithms can be classified as local or global. 
Local search algorithms typically converge towards a local optimum, not necessar- 
ily (often not) the global optimum, and such algorithms are often deterministic and 
have no ability of escaping local optima. Simple hill-climbing is an example. On 
the other hand, we always try to find the global optimum for a given problem, and 
if this global optimality is robust, it is often the best, though it is not always possi- 
ble to find such global optimality. For global optimization, local search algorithms 
are not suitable. We have to use a global search algorithm. Modern metaheuris- 
tic algorithms in most cases are intended for global optimization, though not always 
successful or efficiently. A simple strategy such as hill-climbing with random restart 
may change a local search algorithm into a global search. In essence, randomization 
is an efficient component for global search algorithms. A detailed review of opti- 
mization algorithms will be provided later in the chapter on optimization algorithms 
by Yang. 

Straightforward optimization of a given objective function is not always prac- 
tical. Particularly, if the objective function comes from a computer simulation, 
it may be computationally expensive, noisy or non-differentiable. In such cases, 
so-called surrogate-based optimization algorithms may be useful where the direct 
optimization of the function of interest is replaced by iterative updating and re- 
optimization of its model - a surrogate |5|. The surrogate model is typically con- 
structed from the sampled data of the original objective function, however, it is 
supposed to be cheap, smooth, easy to optimize and yet reasonably accurate so 
that it can produce a good prediction of the function's optimum. Multi-fidelity or 
variable-fidelity optimization is a special case of the surrogate-based optimization 
where the surrogate is constructed from the low-fidelity model (or models) of the 
system of interest fj"5l . Using variable-fidelity optimization is particularly useful is 
the reduction of the computational cost of the optimization process is of primary 
importance. 

Whatever the classification of an algorithm is, we have to make the right choice 
to use an algorithm correctly and sometime a proper combination of algorithms may 
achieve better results. 
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1.4.2 Choice of Algorithms 

From the optimization point of view, the choice of the right optimizer or algorithm 
for a given problem is crucially important. The algorithm chosen for an optimiza- 
tion task will largely depend on the type of the problem, the nature of an algorithm, 
the desired quality of solutions, the available computing resource, time limit, avail- 
ability of the algorithm implementation, and the expertise of the decision-makers 

E3. 

The nature of an algorithm often determines if it is suitable for a particular type 
of problem. For example, gradient-based algorithms such as hill-climbing are not 
suitable for an optimization problem whose objective is discontinuous. Conversely, 
the type of problem we are trying to solve also determines the algorithms we pos- 
sibly choose. If the objective function of an optimization problem at hand is highly 
nonlinear and multimodal, classic algorithms such as hill-climbing and downhill 
simplex are not suitable, as they are local search algorithms. In this case, global op- 
timizers such as particle swarm optimization and cuckoo search are most suitable 

nana. 

Obviously, the choice is also affected by the desired solution quality and com- 
puting resource. As in most applications, computing resources are limited, we have 
to obtain good solutions (not necessary the best) in a reasonable and practical time. 
Therefore, we have to balance the resource and solution quality. We cannot achieve 
solutions with guaranteed quality, though we strive to obtain the quality solutions 
as best as we possibly can. If time is the main constraint, we can use some greedy 
methods, or hill-climbing with a few random restarts. 

Sometimes, even with the best possible intention, the availability of an algo- 
rithms and the expertise of the decision-makers are the ultimate defining factors 
for choosing an algorithm. Even some algorithms are better, we may not have that 
algorithm implemented in our system or we do not have such access, which limits 
our choice. For example, Newton's method, hill-climbing, Nelder-Mead downhill 
simplex, trust-region methods [3|, interior-point methods [19| are implemented in 
many software packages, which may also increase their popularity in applications. 

Even we may have such access, but we may have not the experience in using 
the algorithms properly and efficiently, in this case we may be more comfortable 
and more confident in using other algorithms we have already used before. Our 
experience may be more valuable in selecting the most appropriate and practical 
solutions than merely using the best possible algorithms. 

In practice, even with the best possible algorithms and well-crafted implementa- 
tion, we may still do not get the desired solutions. This is the nature of nonlinear 
global optimization, as most of such problems are NP-hard, and no efficient (in the 
polynomial sense) exist for a given problem. Thus the challenge of the research 
in computational optimization and applications is to find the right algorithms most 
suitable for a given problem so as to obtain the good solutions, hopefully also the 
global best solutions, in a reasonable timescale with a limited amount of resources. 
We aim to do it efficiently in an optimal way. 
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1.5 Simulator 

To solve an optimization problem, the most computationally extensive part is prob- 
ably the evaluation of the design objective to see if a proposed solution is feasible 
and/or if it is optimal. Typically, we have to carry out these evaluations many times, 
often thousands and even millions of times ll25l [27j . Things become even more 
challenging computationally, when each evaluation task takes a long time via some 
black-box simulators. If this simulator is a finite element or CFD solver, the running 
time of each evaluation can take from a few minutes to a few hours or even weeks. 
Therefore, any approach to save computational time either by reducing the number 
of evaluations or by increasing the simulator's efficiency will save time and money. 

1.5.1 Numerical Solvers 

In general, a simulator can be a simple function subroutines, a multiphysics solver, 
or some external black-box evaluators. 

The simplest simulator is probably the direct calculation of an objective function 
with explicit formulas, this is true for standard test functions (e.g, Rosenbrock's 
function), simple design problems (e.g., pressure vessel design), and many prob- 
lems in linear programming [4|. This class of optimization with explicit objectives 
and constraints may form the majority of optimization problems dealt with in most 
textbooks and optimization courses. 

In engineering and industrial applications, the objectives are often implicit and 
can only be evaluated through a numerical simulator, often black-box type. For ex- 
ample, in the design of an airfoil, the aerodynamic performance can only be eval- 
uated either numerically or experimentally. Experiments are too expensive in most 
cases, and thus the only sensible tool is a finite-volume-based CFD solver, which can 
be called for a given setting of design parameters. In structural engineering, a design 
of a structure and building is often evaluated by certain design codes, then by a finite 
element software package, which often takes days or even weeks to run. The eval- 
uation of a proposed solution in real-world applcations is often multidisciplinary, it 
could involve stress-strain analysis, heat transfer, diffusion, electromagnetic waves, 
electrical-chemistry, and others. These phenomena are often coupled, which makes 
the simulations a daunting task, if not impossible. Even so, more and more opti- 
mization and design requires such types of evaluations, and the good news is that 
computing speed is increasing and many efficient numerical methods are becoming 
routine. 

In some rare cases, the optimization objective cannot be written explicitly, and 
cannot be evaluated using any simulation tools. The only possibility is to use some 
external means to carry out such evaluations. This often requires experiments, 
or trial-or-error, or by certain combination of numerical tools, experiment and 
human expertise. This scenario may imply our lack of understanding of the sys- 
tem/mechanisms, or we may not formulate the problem properly. Sometimes, cer- 
tain reformulations can often provide better solutions to the problem. For example, 
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many design problems can be simulated by using neural networks and support 
vector machines. In this case, we know certain objectives of the design, but the re- 
lationship between the parameter setting and the system performance/output is not 
only implicit, but also dynamically changing based on iterative learning/training. 
Fuzzy system is another example, and in this case, special techniques and methods 
are used, which is essentially forms a different subject. 

In this book, we will mainly focus on the cases in which the objective can be 
evaluated either using explicit formulas or using black-box numerical tools/solvers. 
Some case studies of optimization using neural networks will be provided as well. 

1.5.2 Simulation Efficiency 

In terms of computational effort, an efficient simulator is paramount in controlling 
the overall efficiency of any computational optimization. If the objectives can be 
evaluated using explicit functions or formulas, the main barrier is the choice and use 
of an efficient optimizer. In most cases, the evaluation via a numerical solver such 
as FE/CFD package is very expansive. This is the bottleneck of the whole optimiza- 
tion process. Therefore, various methods and approximations are designed either 
to reduce the number of such expensive evaluations or to use some approximation 
(though more often a good combination of both). 

The main way to reduce the number of objective evaluations is to use an effi- 
cient algorithm, so that only a small number of such evaluations are needed. In most 
cases, this is not possible. We have to use some approximation techniques to esti- 
mate the objectives, or to construct an approximation model to predict the solver's 
outputs without actual using the solver. Another way is to replace the original ob- 
jective function by its lower-fidelity model, e.g., obtained from a computer simu- 
lation based on coarsely-discretized structure of interest. The low-fidelity model is 
faster but not as accurate as the original one, and therefore it has to be corrected. 
Special techniques have to be applied to use an approximation or corrected low- 
fidelity model in the optimization process so that the optimal design can be obtained 
at a low computational cost. All of this falls into the category of surrogate-based 
optimizationj20l [T4l[T5l[T6l[T7l. 

Surrogate models are approximate techniques to construct response surface mod- 
els, or metamodels |[22l . The main idea is to approximate or mimic the system 
behaviour so as to carry out evaluations cheaply and efficiently, still with accu- 
racy comparable to the actual system. Widely used techniques include polynomial 
response surface or regression, radial basis functions, ordinary Kriging, artificial 
neural networks, support vector machines, response correction, space mapping and 
others. The data used to create the models comes from the sampling of the de- 
sign space and evaluating the system at selected locations. Surrogate models can 
be used as predictive tools in the search for the optimal design of the system of 
interest. This can be realized by iterative re-optimization of the surrogate (exploita- 
tion), filling the gaps between sample points to improve glocal accuracy of the model 
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(exploration of the design space) or a mixture of both |5|. The new data is used 
to update the surrogate. A detailed review of surrogate-modeling techniques and 
surrogate-base optimization methods will be given by Koziel et al. later. 



1.6 Latest Developments 

Computational optimization has been always a major research topic in engineering 
design and industrial applications. New optimization algorithms, numerical meth- 
ods, approximation techniques and models, and applications are routinely emerging. 
Loosely speaking, the state-of-the-art developments can put into three areas: new al- 
gorithms, new models, and new applications. 

Optimization algorithms are constantly being improved. Classic algorithms such 
as derivative-free methods and pattern search are improved and applied in new ap- 
plications both successfully and efficiently. 

Evolutionary algorithms and metaheuristics are widely used, and there are many 
successful examples which will be introduced in great detail later in this book. 
Sometimes, complete new algorithms appear and are designed for global optimiza- 
tion. Hybridization of different algorithms are also very popular. New algorithms 
such as particle swarm optimization fTTTl . harmony search [6] and cuckoo search 
l28ll are becoming powerful and popular. 

As we can see later, this book summarize the latest development of these algo- 
rithms in the context of optimization and applications. 

Many studies have focused on the methods and techniques of constructing ap- 
propriate surrogate models of the high-fidelity simulation data. Surrogate modeling 
methodologies as well as surrogate-based optimization techniques have improved 
significantly. The developments of various aspects of surrogate-based optimization, 
including the design of experiments schemes, methods of constructing and validat- 
ing the surrogate models, as well as optimization algorithms exploiting surrogate 
models, both function-approximation and physically-based will be summarized in 
this book. 

New applications are diverse and state-of-the-art developments are summa- 
rized, including optimization and applications in network, oil industry, microwave 
engineering, aerospace engineering, neural networks, environmental modelling, 
scheduling, structural engineering, classification, economics, and multi-objective 
optimization problems. 
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Chapter 2 

Optimization Algorithms 

Xin-She Yang 



Abstract. The right choice of an optimization algorithm can be crucially impor- 
tant in finding the right solutions for a given optimization problem. There exist a 
diverse range of algorithms for optimization, including gradient-based algorithms, 
derivative-free algorithms and metaheuristics. Modern metaheuristic algorithms are 
often nature-inspired, and they are suitable for global optimization. In this chapter, 
we will briefly introduce optimization algorithms such as hill-climbing, trust-region 
method, simulated annealing, differential evolution, particle swarm optimization, 
harmony search, firefly algorithm and cuckoo search. 

2.1 Introduction 

Algorithms for optimization are more diverse than the types of optimization, though 
the right choice of algorithms is an important issue, as we discussed in the first chap- 
ter where we have provided an overview. There are a wide range of optimization al- 
gorithms, and a detailed description of each can take up the whole book of more than 
several hundred pages. Therefore, in this chapter, we will introduce a few important 
algorithms selected from a wide range of optimization algorithms l4l l27ll3TI . with a 
focus on the metaheuristic algorithms developed after the 1990s. This selection does 
not mean that the algorithms not described here are not popular. In fact, they may 
be equally widely used. Whenever an algorithm is used in this book, we will try to 
provide enough details so that readers can see how they are implemented; alterna- 
tively, in some cases, enough citations and links will be provided so that interested 
readers can pursue further research using these references as a good start. 
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2.2 Derivative-Based Algorithms 

Derivative-based or gradient-based algorithms use the information of derivatives. 
They are very efficient as local search algorithms, but may have the disadvantage 
of being trapped in a local optimum if the problem of interest is not convex. It 
is required that the objective function is sufficiently smooth so that its first (and 
often second) derivatives exist. Discontinuity in objective functions may render such 
methods unsuitable. One of the classical examples is the Newton's method, while 
a modern example is the method of conjugate gradient. Gradient-base methods are 
widely used in many applications and discrete modelling 131 1201 . 

2. 2. / Newton 's Method and Hill- Climbing 

One of the most widely used algorithms is Newton's method, which is a root- finding 
algorithm as well as a gradient-based optimization algorithm [ 10]. For a given func- 
tion f(x), its Tayler expansions 

/(*) = /(*„) + (Vf(x n )) T Ax + ^Ax T V 2 f{x n )Ax + ..., (2.1) 

in terms of Ax = x — x n about a fixed point x n leads to the following iterative formula 

x = x n -H- 1 Vf{x n ), (2.2) 

where H~ l (xy 1 *) is the inverse of the symmetric Hessian matrix H — V 2 /(x„), which 
is defined as 



H(x) = V 2 /W 



/d 2 f d>f \ 
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\ dx„dxi ' ' dx% / 



(2.3) 



Starting from an initial guess vector x®\ the iterative Newton's formula for the nth 
iteration becomes 

x (n+i) =x (n) _//-l( x ("))V/(xM). (2.4) 

In order to speed up the convergence, we can use a smaller step size a € (0, 1] and 
we have the modified Newton's method 

X {n+D =x (n) _ aH-\x^)Vf(xW). (2.5) 

It is often time-consuming to calculate the Hessian matrix using second derivatives. 
In this case, a simple and yet efficient alternative is to use an identity matrix / to 
approximate H so that H l = I, which leads to the quasi-Newton method 

x {n+l) = x {n) _ a /V/( X (")). (2.6) 

In essence, this is the steepest descent method. 
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For a maximization problem, the steepest descent becomes a hill-climbing. That 
is, the aim is to climb up to the highest peak or to find the highest possible value 
of an objective f(x) from the current point x( n > . From the Taylor expansion of f{x) 
about x 1 -"', we have 

/(>+')) = /(*(») + As) « /(,« + (Vf(x^)) T As, (2.7) 

where As = x^" +1 ' — jfW is the increment vector. Since we are trying to find a better 
(higher) approximation to the objective function, it requires that 

/(*W +As)-f(xW) = (Vf) T As > 0. (2.8) 

From vector analysis, we know that the inner product u T v of two vectors u and v is 
the largest when they are parallel. Therefore, we have 

As = aVf(x M ), (2.9) 

where a > is the step size. In the case of minimization, the direction As is along 
the steepest descent in the negative gradient direction. 

It is worth pointing out that the choice of the step size a is very important. A very 
small step size means slow movement towards the local optimum, while a large step 
may overshoot and subsequently makes it move far away from the local optimum. 
Therefore, the step size a — tt" should be different at each iteration and should 
be chosen so as to maximize or minimize the objective function, depending on the 
context of the problem. 



2.2.2 Conjugate Gradient Method 

The conjugate gradient method is one of most widely used algorithms and it belongs 
to a wider class of the so-called Krylov subspace iteration methods. The conjugate 
gradient method was pioneered by Magnus Hestenes, Eduard Stiefel and Cornelius 
Lanczos in the 1950s |[T3l . In essence, the conjugate gradient method solves the 
following linear system 

Au = b, (2.10) 

where A is often a symmetric positive definite matrix. This system is equivalent to 
minimizing the following function f(u) 

f{u) = -u T Au-b T u + v, (2.11) 

where v is a vector constant and can be taken to be zero. We can easily see that 
V/(m) = leads to Au = b. In theory, these iterative methods are closely related to 
the Krylov subspace Jif n spanned by A and b as defined by 

JKT„{A,b) = {Ib,Ab,A 2 b,...,A"- l b}, (2.12) 



16 X.-S. Yang 

where A = /. 

If we use an iterative procedure to obtain the approximate solution u n to Au = b 
at «th iteration, the residual is given by 

r n — b— Au„, (2.13) 

which is essentially the negative gradient V/(«„). 

The search direction vector in the conjugate gradient method can subsequently 

be determined by 

d T Ar 
d„+\ =r n - " " d„. (2.14) 

dfiAun 

The solution often starts with an initial guess uq at n = 0, and proceeds iteratively. 
The above steps can compactly be written as 

u n+1 = u„ + a„d„, r n+l = r„ - a n Ad n , (2.15) 

and 



where 



d„+i= r n+ i+fi n d„, (2.16) 

d„Ad„ r l n r n 

Iterations stop when a prescribed accuracy is reached. In the case when A is not 
symmetric, we can use other algorithms such as the generalized minimal residual 
(GMRES) algorithm developed by Y. Saad and M. H. Schultz in 1986. 



2.3 Derivative-Free Algorithms 

Algorithms using derivatives are efficient, but may pose certain strict requirements 
on the objective functions. In case of discontinuity exists in objective functions, 
derivative-free algorithms may be more efficient and natural. Hooke- Jeeves pattern 
search is among one of the earliest, which forms the basis of many modern vari- 
ants of pattern search. Nelder-Mead downhill simplex method |19] is another good 
example of derivative-free algorithms. Furthermore, the widely used trust-region 
method use some form of approximation to the objective function in a local re- 
gion, and many surrogate-based models have strong similarities to the pattern search 
method. 



2.3.1 Pattern Search 

Many search algorithms such as the steepest descent method experience slow con- 
vergence near the local minimum. They are also memoryless because the past in- 
formation during the search is not used to produce accelerated moves in the future. 
The only information they use is the current location x' n \ gradient and value of the 



2 Optimization Algorithms 17 

objective itself at step n. If the past information such as the steps at n — 1 and n is 
properly used to generate a new move at step n + 1 , it may speed up the convergence. 
The Hooke-Jeeves pattern search method is one of such methods that incorporate the 
past history of iterations in producing a new search direction. 

The Hooke-Jeeves pattern search method consists of two moves: exploratory 
move and pattern move. The exploratory moves explore the local behaviour and 
information of the objective function so as to identify any potential sloping valleys 
if they exist. For any given step size (each coordinate direction can have a different 
increment) Ai(i =1,2, .. .,£>), exploration movement performs from an initial start- 
ing point along each coordinate direction by increasing or decreasing ±4;, if the 
new value of the objective function does not increase (for a minimization problem), 
that is f(Xf ) < f(xf ), the exploratory move is considered as successful. If it is 
not successful, then a step is tried in the opposite direction, and the result is updated 
only if it is successful. When all the d coordinates have been explored, the resulting 
point forms a base point x' n > ■ 

The pattern move intends to move the current base x'"' along the base line (jv-"' — 
x ("-i)) from the previous (historical) base point to the current base point. The move 
is carried out by the following formula 

*("+!) =X (») + [ X M _>-i)]. (2.18) 

Then x^ n+i ' forms a new temporary base point for further new exploratory moves. 
If the pattern move produces improvement (lower value of f(x)), the new base point 
x (n+ 1 ) j s success f u iiy updated. If the pattern move does not lead to any improvement 
or a lower value of the objective function, then the pattern move is discarded and 
a new search starts from x- n \ and the new search moves should use a smaller step 
size by reducing increments D,// where y > 1 is the step reduction factor. Iterations 
continue until the prescribed tolerance e is met. 



2.3.2 Trust-Region Method 

The so-called trust-region method is among the most widely used optimization al- 
gorithms, and its fundamental ideas have developed over many years with many 
seminal papers by a dozen of pioneers. A good history review of the trust-region 
methods can be found (3J|6|. Then, in 1970, Powell proved the global convergence 
for the trust-region method [22]. 

In the trust-region algorithm, a fundamental step is to approximate the nonlinear 
objective function by using truncated Taylor expansions, often in a quadratic form 
in a so-called trust region which is the shape of the trust region is a hyperellipsoid. 

The approximation to the objective function in the trust region will make it sim- 
pler to find the next trial solution x^ + i from the current solution xj. Then, we intend 
to find Xfc+i with a sufficient decrease in the objective function. How good the ap- 
proximation 0£ is to the actual objective f(x) can be measured by the ratio of the 
achieved decrease to the predicted decrease 
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tt= /5 X *i~{ ( ? +1 V (2-19) 

If this ratio is close to unity, we have a good approximation and then should move 
the trust region to Xk+i. The trust-region should move and update iteratively until 
the (global) optimality is found or until a fixed number of iterations is reached. 

There are many other methods, and one of the most powerful and widely used is 
the polynomial-time efficient algorithm, called the interior-point method [ 1 6 1 , and 
many variants have been developed since 1984. 

All these above algorithms are deterministic, as they have no random 
components. Thus, they usually have some disadvantages in dealing with highly 
nonlinear, multimodal, global optimization problems. In fact, some randomization is 
useful and necessary in algorithms, and metaheuristic algorithms are such powerful 
techniques. 



2.4 Metaheuristic Algorithms 

Metaheuristic algorithms are often nature-inspired, and they are now among the 
most widely used algorithms for optimization. They have many advantages over 
conventional algorithms, as discussed in the first chapter for introduction and 
overview. There are a few recent books which are solely dedicated to metaheuris- 
tic algorithms [27, 29, 30 1. Metaheuristic algorithms are very diverse, including ge- 
netic algorithms, simulated annealing, differential evolution, ant and bee algorithms, 
particle swarm optimization, harmony search, firefly algorithm, cuckoo search and 
others. Here we will introduce some of these algorithms briefly. 



2.4.1 Simulated Annealling 

Simulated annealing developed by Kirkpatrick et al. in 1983 is among the first meta- 
heuristic algorithms, and it has been applied in almost every area of optimization 
ifTTl . Unlike the gradient-based methods and other deterministic search methods, 
the main advantage of simulated annealing is its ability to avoid being trapped in 
local minima. The basic idea of the simulated annealing algorithm is to use random 
search in terms of a Markov chain, which not only accepts changes that improve the 
objective function, but also keeps some changes that are not ideal. 

In a minimization problem, for example, any better moves or changes that de- 
crease the value of the objective function / will be accepted; however, some changes 
that increase / will also be accepted with a probability p. This probability p, also 
called the transition probability, is determined by 

r AE n 

/7 = exp[-— ], (2.20) 

Kb l 
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where kg is the Boltzmann's constant, and T is the temperature for controlling the 
annealing process. AE is the change of the energy level. This transition probability 
is based on the Boltzmann distribution in statistical mechanics. 

The simplest way to link AE with the change of the objective function Af is to 
use 

AE = yAf, (2.21) 

where y is a real constant. For simplicity without losing generality, we can use 
ks = 1 and 7=1. Thus, the probability p simply becomes 

P {AfJ)=e- A fl T . (2.22) 

Whether or not a change is accepted, a random number r is often used as a threshold. 
Thus, if p > r, or 

p = e - Af/T > r, (2.23) 

the move is accepted. 

Here the choice of the right initial temperature is crucially important. For a given 
change Af, if T is too high (T — ► »), then p — » 1, which means almost all the 
changes will be accepted. If T is too low (T — > 0), then any Af > (worse solution) 
will rarely be accepted as p — ► 0, and thus the diversity of the solution is limited, but 
any improvement Af will almost always be accepted. In fact, the special case T — > 
corresponds to the classical hill-climbing because only better solutions are accepted, 
and the system is essentially climbing up or descending along a hill. Therefore, if T 
is too high, the system is at a high energy state on the topological landscape, and the 
minima are not easily reached. If T is too low, the system may be trapped in a local 
minimum (not necessarily the global minimum), and there is not enough energy for 
the system to jump out the local minimum to explore other minima including the 
global minimum. So a proper initial temperature should be calculated. 

Another important issue is how to control the annealing or cooling process so 
that the system cools down gradually from a higher temperature to ultimately freeze 
to a global minimum state. There are many ways of controlling the cooling rate or 
the decrease of the temperature, geometric cooling schedules are often widely used, 
which essentially decrease the temperature by a cooling factor < a < 1 so that T 
is replaced by aT or 

T(t) = T a', t=l,2,-,t f , (2.24) 

where tf is the maximum number of iterations. The advantage of this method is that 
T — > when t — ► °°, and thus there is no need to specify the maximum number of 
iterations if a tolerance or accuracy is prescribed. Simulated annealling has been 
applied in a wide range of optimization problems lfT71[20ll . 



2.4.2 Genetic Algorithms and Differential Evolution 

Simulated annealing is a trajectory-based algorithm, as it only uses a single agent. 
Other algorithms such as genetic algorithms use multiple agents or a population to 
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carry out the search, which may have some advantage due to its potential paral- 
lelism. 

Genetic algorithms are a classic of algorithms based on the abstraction of Dar- 
win's evolution of biological systems, pioneered by J. Holland and his collaborators 
in the 1960s and 1970s [14|. Holland was the first to use genetic operators such 
as the crossover and recombination, mutation, and selection in the study of adaptive 
and artificial systems. Genetic algorithms have two main advantages over traditional 
algorithms: the ability of dealing with complex problems and parallelism. Whether 
the objective function is stationary or transient, linear or nonlinear, continuous or 
discontinuous, it can be dealt with by genetic algorithms. Multiple genes can be 
suitable for parallel implementation. 

Three main components or genetic operators in genetic algorithms are: crossover, 
mutation, and selection of the fittest. Each solution is encoded in a string (often bi- 
nary or decimal), called a chromosome. The crossover of two parent strings pro- 
duce offsprings (new solutions) by swapping part or genes of the chromosomes. 
Crossover has a higher probability, typically 0.8 to 0.95. On the other hand, muta- 
tion is carried out by flipping some digits of a string, which generates new solutions. 
This mutation probability is typically low, from 0.001 to 0.05. New solutions gen- 
erated in each generation will be evaluated by their fitness which is linked to the 
objective function of the optimization problem. The new solutions are selected ac- 
cording to their fitness - selection of the fittest. Sometimes, in order to make sure 
that the best solutions remain in the population, the best solutions are passed onto 
the next generation without much change, this is called elitism. 

Genetic algorithms have been applied to almost all area of optimization, design 
and applications. There are hundreds of good books and thousand of research arti- 
cles. There are many variants and hybridization with other algorithms, and interested 
readers can refer to more advanced literature such as fi"2l[T4ll , 

Differential evolution (DE) was developed by R. Storn and K. Price by their nom- 
inal papers in 1996 and 1997 [25 26 1. It is a vector-based evolutionary algorithm, 
and can be considered as a further development to genetic algorithms. It is a stochas- 
tic search algorithm with self-organizing tendency and does not use the information 
of derivatives. Thus, it is a population-based, derivative-free method. 

As in genetic algorithms, design parameters in a cf-dimensional search space are 
represented as vectors, and various genetic operators are operated over their bits 
of strings. However, unlikely genetic algorithms, differential evolution carries out 
operations over each component (or each dimension of the solution). Almost every- 
thing is done in terms of vectors. For example, in genetic algorithms, mutation is 
carried out at one site or multiple sites of a chromosome, while in differential evolu- 
tion, a difference vector of two randomly-chosen population vectors is used to per- 
turb an existing vector. Such vectorized mutation can be viewed as a self-organizing 
search, directed towards an optimality. 

For a ^-dimensional optimization problem with d parameters, a population of n 
solution vectors are initially generated, we have xi where i = 1,2, ...,n. For each 
solution Xi at any generation /, we use the conventional notation as 
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*i = \*\p*it>—>*dp-i (2-25) 

which consists of J-components in the cf-dimensional space. This vector can be 
considered as the chromosomes or genomes. 

Differential evolution consists of three main steps: mutation, crossover and 
selection. 

Mutation is carried out by the mutation scheme. For each vector x, at any time or 
generation t, we first randomly choose three distinct vectors x p , x» and x r at t , and 
then generate a so-called donor vector by the mutation scheme 

v{ +1 =^ + F(4-4), (2.26) 

where F £ [0,2] is a parameter, often referred to as the differential weight. This 
requires that the minimum number of population size is n > 4. In principle, F £ 
[0,2], but in practice, a scheme with F £ [0, 1] is more efficient and stable. 

The crossover is controlled by a crossover probability C r £ [0, 1] and actual 
crossover can be carried out in two ways: binomial and exponential. Selection is 
essentially the same as that used in genetic algorithms. It is to select the most fittest, 
and for minimization problem, the minimum objective value. Therefore, we have 

, + i_u +l if/(«; +l )</(*$), 



e+ l = i u \ u *V*i ;^W> (2.27) 

' [ x i otherwise. 

Most studies have focused on the choice of F, C r and n as well as the modifica- 
tion of ( 12.261 1. In fact, when generating mutation vectors, we can use many different 
ways of formulating (12.26b , and this leads to various schemes with the naming con- 
vention: DE/x/y/z where x is the mutation scheme (rand or best), y is the number 
of difference vectors, and z is the crossover scheme (binomial or exponential). The 
basic DE/Rand/1/Bin scheme is given in ( 12.26b . Following a similar strategy, we can 
design various schemes. In fact, 10 different schemes have been formulated, and for 
details, readers can refer to [23]. 



2.4.3 Particle Swarm Optimization 

Particle swarm optimization (PSO) was developed by Kennedy and Eberhart in 
1995 [ 15 1, based on the swarm behaviour such as fish and bird schooling in nature. 
Since then, PSO has generated much wider interests, and forms an exciting, ever- 
expanding research subject, called swarm intelligence. PSO has been applied to al- 
most every area in optimization, computational intelligence, and design/scheduling 
applications. There are at least two dozens of PSO variants, and hybrid algorithms 
by combining PSO with other existing algorithms are also increasingly popular. 

This algorithm searches the space of an objective function by adjusting the tra- 
jectories of individual agents, called particles, as the piecewise paths formed by 
positional vectors in a quasi-stochastic manner. The movement of a swarming par- 
ticle consists of two major components: a stochastic component and a deterministic 
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component. Each particle is attracted toward the position of the current global best 
g* and its own best location x* in history, while at the same time it has a tendency 
to move randomly. 

Let Xi and V; be the position vector and velocity for particle /', respectively. The 
new velocity vector is determined by the following formula 

v'+ l =v t l + aeiQ[g*-x t i }+l5e2Q[x*-x' i }. (2.28) 

where S\ and £2 are two random vectors, and each entry taking the values between 
and 1. The Hadamard product of two matrices u v is defined as the entry wise 
product, that is [u v]y = KyVy. The parameters a and /3 are the learning parameters 
or acceleration constants, which can typically be taken as, say, a « j3 sa 2. 

The initial locations of all particles should distribute relatively uniformly so that 
they can sample over most regions, which is especially important for multimodal 
problems. The initial velocity of a particle can be taken as zero, that is, v' j =0 = 0. The 
new position can then be updated by 

x 1 + l =x 1 i + ^ +l . (2.29) 

Although v, can be any values, it is usually bounded in some range [0, v max ] . 

There are many variants which extend the standard PSO algorithm IT5l[30l[3l1 . 
and the most noticeable improvement is probably to use inertia function Q(t) so that 
v' is replaced by 0(?) v ; 

VJ+ 1 = ev' i + ae 1 [g*-x t i \+Pe 2 ®[x*-x' i \, (2.30) 

where 6 takes the values between and 1 . In the simplest case, the inertia function 
can be taken as a constant, typically 9 rs 0.5 ~ 0.9. This is equivalent to introducing 
a virtual mass to stabilize the motion of the particles, and thus the algorithm is 
expected to converge more quickly. 



2.4.4 Harmony Search 

Harmony Search (HS) is a relatively new heuristic optimization algorithm and it 
was first developed by Z. W. Geem et al. in 2001 [9|. Harmony search can be ex- 
plained in more detail with the aid of the discussion of the improvisation process by 
a musician. When a musician is improvising, he or she has three possible choices: 
(1) play any famous piece of music (a series of pitches in harmony) exactly from 
his or her memory; (2) play something similar to a known piece (thus adjusting the 
pitch slightly); or (3) compose new or random notes. If we formalize these three op- 
tions for optimization, we have three corresponding components: usage of harmony 
memory, pitch adjusting, and randomization. 

The usage of harmony memory is important as it is similar to choose the best fit 
individuals in the genetic algorithms. This will ensure the best harmonies will be car- 
ried over to the new harmony memory. In order to use this memory more effectively, 



2 Optimization Algorithms 23 

we can assign a parameter r accep t 6 [0,1], called harmony memory accepting or 
considering rate. If this rate is too low, only few best harmonies are selected and 
it may converge too slowly. If this rate is extremely high (near 1), almost all the 
harmonies are used in the harmony memory, then other harmonies are not explored 
well, leading to potentially wrong solutions. Therefore, typically, r accep t = 0.7 ~ 
0.95. 

To adjust the pitch slightly in the second component, we have to use a method 
such that it can adjust the frequency efficiently. In theory, the pitch can be adjusted 
linearly or nonlinearly, but in practice, linear adjustment is used. If x \d is the current 
solution (or pitch), then the new solution (pitch) x new is generated by 

.T new =x \d + b p (2e- 1), (2.31) 

where e is a random number drawn from a uniform distribution [0,1]. Here b p is the 
bandwidth, which controls the local range of pitch adjustment. In fact, we can see 
that the pitch adjustment (12. 3U is a random walk. 

Pitch adjustment is similar to the mutation operator in genetic algorithms. We 
can assign a pitch-adjusting rate (r pa ) to control the degree of the adjustment. If r pa 
is too low, then there is rarely any change. If it is too high, then the algorithm may 
not converge at all. Thus, we usually use r pa =0.1^ 0.5 in most simulations. 

The third component is the randomization, which is to increase the diversity of 
the solutions. Although adjusting pitch has a similar role, but it is limited to certain 
local pitch adjustment and thus corresponds to a local search. The use of random- 
ization can drive the system further to explore various regions with high solution 
diversity so as to find the global optimality. HS has been applied to solve many 
optimization problems including function optimization, water distribution network, 
groundwater modelling, energy-saving dispatch, structural design, vehicle routing, 
and others. 



2.4.5 Firefly Algorithm 

Firefly Algorithm (FA) was developed by Xin-She Yang in 2007 [29, 32|, which 
was based on the flashing patterns and behaviour of fireflies. In essence, FA uses the 
following three idealized rules: 

• Fireflies are unisex so that one firefly will be attracted to other fireflies regardless 
of their sex. 

• The attractiveness is proportional to the brightness and they both decrease as their 
distance increases. Thus for any two flashing fireflies, the less brighter one will 
move towards the brighter one. If there is no brighter one than a particular firefly, 
it will move randomly. 

• The brightness of a firefly is determined by the landscape of the objective func- 
tion. 

As a firefly's attractiveness is proportional to the light intensity seen by adjacent 
fireflies, we can now define the variation of attractiveness /3 with the distance r by 
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P=Poe- Yr \ (2.32) 

where /3o is the attractiveness at r = 0. 

The movement of a firefly i is attracted to another more attractive (brighter) firefly 
j is determined by 

x>+ 1 =x J i + p Q e- Tr -j(^-j( t i ) + ae t i , (2.33) 

where the second term is due to the attraction. The third term is randomization with 
a being the randomization parameter, and e\ is a vector of random numbers drawn 
from a Gaussian distribution or uniform distribution at time t. If /3o = 0, it becomes 
a simple random walk. Furthermore, the randomization e' can easily be extended to 
other distributions such as Levy flights. 

The Levy flight essentially provides a random walk whose random step length is 
drawn from a Levy distribution 

Levy~M = r\ (1<A<3), (2.34) 

which has an infinite variance with an infinite mean. Here the steps essentially form 
a random walk process with a power-law step-length distribution with a heavy tail. 
Some of the new solutions should be generated by Levy walk around the best solu- 
tion obtained so far, this will speed up the local search. 

A demo version of firefly algorithm implementation, without Levy flights, can be 
found at Math works file exchange web site^J Firefly algorithm has attracted much 
attention |Q~1[24). A discrete version of FA can efficiently solve NP-hard scheduling 
problems [24|, while a detailed analysis has demonstrated the efficiency of FA over 
a wide range of test problems, including multobjective load dispatch problems [ 1 1. 

2.4.6 Cuckoo Search 

Cuckoo search (CS) is one of the latest nature-inspired metaheuristic algorithms, 
developed in 2009 by Xin-She Yang and Suash Deb [34|. CS is based on the 
brood parasitism of some cuckoo species. In addition, this algorithm is enhanced 
by the so-called Levy flights [21 1, rather than by simple isotropic random walks. 
Recent studies show that CS is potentially far more efficient than PSO and genetic 
algorithms |35|. 

Cuckoo are fascinating birds, not only because of the beautiful sounds they can 
make, but also because of their aggressive reproduction strategy. Some species such 
as the ani and Guira cuckoos lay their eggs in communal nests, though they may 
remove others' eggs to increase the hatching probability of their own eggs. Quite a 
number of species engage the obligate brood parasitism by laying their eggs in the 
nests of other host birds (often other species). 

There are three basic types of brood parasitism: intraspecific brood parasitism, 
cooperative breeding, and nest takeover. Some host birds can engage direct conflict 
with the intruding cuckoos. If a host bird discovers the eggs are not their owns, they 
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will either get rid of these alien eggs or simply abandon its nest and build a new nest 
elsewhere. Some cuckoo species such as the New World brood-parasitic Tapera have 
evolved in such a way that female parasitic cuckoos are often very specialized in the 
mimicry in colour and pattern of the eggs of a few chosen host species. This reduces 
the probability of their eggs being abandoned and thus increases their reproductivity. 

In addition, the timing of egg-laying of some species is also amazing. Parasitic 
cuckoos often choose a nest where the host bird just laid its own eggs. In general, the 
cuckoo eggs hatch slightly earlier than their host eggs. Once the first cuckoo chick 
is hatched, the first instinct action it will take is to evict the host eggs by blindly 
propelling the eggs out of the nest, which increases the cuckoo chick's share of food 
provided by its host bird. Studies also show that a cuckoo chick can also mimic the 
call of host chicks to gain access to more feeding opportunity. 

For simplicity in describing the Cuckoo Search, we now use the following three 
idealized rules: 

• Each cuckoo lays one egg at a time, and dumps it in a randomly chosen nest; 

• The best nests with high-quality eggs will be carried over to the next generations; 

• The number of available host nests is fixed, and the egg laid by a cuckoo is 
discovered by the host bird with a probability p a G [0, 1] . In this case, the host bird 
can either get rid of the egg, or simply abandon the nest and build a completely 
new nest. 

As a further approximation, this last assumption can be approximated by a fraction 
p a of the n host nests are replaced by new nests (with new random solutions). 

For a maximization problem, the quality or fitness of a solution can simply be 
proportional to the value of the objective function. Other forms of fitness can be 
defined in a similar way to the fitness function in genetic algorithms. 

For the implementation point of view, we can use the following simple represen- 
tations that each egg in a nest represents a solution, and each cuckoo can lay only 
one egg (thus representing one solution), the aim is to use the new and potentially 
better solutions (cuckoos) to replace a not-so-good solution in the nests. Obviously, 
this algorithm can be extended to the more complicated case where each nest has 
multiple eggs representing a set of solutions. For this present work, we will use the 
simplest approach where each nest has only a single egg. In this case, there is no 
distinction between egg, nest or cuckoo, as each nest corresponds to one egg which 
also represents one cuckoo. 

Based on these three rules, the basic steps of the Cuckoo Search (CS) can be 
summarized as the pseudo code shown in Fig. 12.11 

When generating new solutions x( t+i > for, say, a cuckoo i, a Levy flight is 
performed 

xf +l) =xf + a©Levy(A), (2.35) 

where a > is the step size which should be related to the scales of the problem 
of interests. In most cases, we can use a = O(L/10) where L is the characteristic 
scale of the problem of interest, while in some case a = O(L/100) can be more 
effective and avoid flying to far. The above equation is essentially the stochastic 
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Objective function f(x), x = (x\ , ■ ■■,Xd) T 

Generate initial population of n host nests x,- 

while (t <MaxGeneration) or (stop criterion) 

Get a cuckoo randomly/generate a solution by Levy flights 

and then evaluate its quality/fitness F, 

Choose a nest among n (say, f) randomly 

if (Ft > Fj), 

Replace j by the new solution 

end 

A fraction (p a ) of worse nests are abandoned 

and new ones/solutions are built/generated 

Keep best solutions (or nests with quality solutions) 

Rank the solutions and find the current best 

end while 

Fig. 2.1 Pseudo code of the Cuckoo Search (CS) 

equation for a random walk. In general, a random walk is a Markov chain whose 
next status/location only depends on the current location (the first term in the above 
equation) and the transition probability (the second term). The product means 
entrywise multiplications. This entrywise product is similar to those used in PSO, 
but here the random walk via Levy flight is more efficient in exploring the search 
space, as its step length is much longer in the long run. However, a substantial 
fraction of the new solutions should be generated by far field randomization and 
whose locations should be far enough from the current best solution, this will make 
sure that the system will not be trapped in a local optimum 1 35 1 . 

The pseudo code given here is sequential, however, vectors should be used from 
an implementation point of view, as vectors are more efficient than loops. A Matlab 
implementation is given by the author, and can be downloaded^ 

2.5 A Unified Approach to Metaheuristics 
2.5.1 Characteristics of Metaheuristics 

There are many other metaheuristic algorithms which are equally popular and pow- 
erful, and these include Tabu search [11), ant colony optimization|7|, artificial im- 
mune system |8|, bee algorithms, bat algorithm [ 33 1 and others lfT8ll3Tl . 

The efficiency of metaheuristic algorithms can be attributed to the fact that they 
imitate the best features in nature, especially the selection of the fittest in biological 
systems which have evolved by natural selection over millions of years. 

Two important characteristics of metaheuristics are: intensification and diversi- 
fication Q- Intensification intends to search locally and more intensively, while 
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diversification makes sure the algorithm explores the search space globally 
(hopefully also efficiently). 

Furthermore, intensification is also called exploitation, as it typically searches 
around the current best solutions and selects the best candidates or solutions. Simi- 
larly, diversification is also called exploration, as it tends to explore the search space 
more efficiently, often by large-scale randomization. 

A fine balance between these two components is very important to the overall 
efficiency and performance of an algorithm. Too little exploration and too much 
exploitation could cause the system to be trapped in local optima, which makes it 
very difficult or even impossible to find the global optimum. On the other hand, if 
there is too much exploration but too little exploitation, it may be difficult for the 
system to converge and thus slows down the overall search performance. A proper 
balance itself is an optimization problem, and one of the main tasks of designing new 
algorithms is to find a certain balance concerning this optimality and/or tradeoff. 

Furthermore, just exploitation and exploration are not enough. During the search, 
we have to use a proper mechanism or criterion to select the best solutions. The most 
common criterion is to use the Survival of the Fittest, that is to keep updating the 
the current best found so far. In addition, certain elitism is often used, and this is to 
ensure the best or fittest solutions are not lost, and should be passed onto the next 
generations. 

2.6 Generalized Evolutionary Walk Algorithm (GEWA) 

From the above discussion of all the major components and their characteristics, we 
realized that a good combination of local search and global search with a proper 
selection mechanism should produce a good metaheuristic algorithm, whatever the 
name it may be called. 

In principle, the global search should be carried out more frequently at the initial 
stage of the search or iterations. Once a number of good quality solutions are found, 
exploration should be sparse on the global scale, but frequent enough so as to escape 
any local trap if necessary. On the other hand, the local search should be carried out 
as efficient as possible, so a good local search method should be used. The proper 
balance of these two is paramount. 

Using these basic components, we can now design a generic, metaheuristic algo- 
rithm for optimization, we can call it the Generalized Evolutional Walk Algorithm 
(GEWA), which was first formulated by Yang [30 1 in 2010. Evolutionary walk is a 
random walk, but with a biased selection towards optimality. This is a generalized 
framework for global optimization. 

There are three major components in this algorithm: 1) global exploration by 
randomization, 2) intensive local search by random walk, and 3) the selection of 
the best with some elitism. The pseudo code of GEWA is shown in Fig. 12.21 The 
random walk should be carried out around the current global best g^ so as to exploit 
the system information such as the current best more effectively. We have 
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*r+l =g* + w, (2.36) 

and 

w = ed, (2.37) 

where e is drawn from a Gaussian distribution or normal distribution N(0, <7 2 ), and 
d is the step length vector which should be related to the actual scales of independent 
variables. For simplicity, we can take a = 1. 



Initial a population of n walkers Xi (i= 1 , 2, . . . , n) ; 
Evaluate fitness F; of n walkers & find the current best g t \ 
while (t <MaxGeneration) or (stop criterion); 

Discard the worst solution and replace it by d2,38t or d2.39t ; 

if (rand < a), 

Local search: random walk around the best 

x, +1 =g,+ed (2.38) 

else 

Global search: randomization (Uniform, Levy flights etc) 

x t+ i=L+(U-L)e (uniform) (2.39) 

end 

Evaluate new solutions and find the current best g'^ ; 
t = t + l; 
end while 

Postprocess results and visualization; 

Fig. 2.2 Generalized Evolutionary Walk Algorithm (GEWA) 



The randomization step can be achieved by 

x t+l =L+(U-L)e u , (2.40) 

where e„ is drawn from a uniform distribution Unif [0, 1 ] . U and L are the upper and 
lower bound vectors, respectively. 

Typically, a « 0.25 ~ 0.7. We will use a = 0.5 in our implementation. Interested 
readers can try to do some parametric studies. 

Again two important issues are: 1) the balance of intensification and diversifica- 
tion controlled by a single parameter a, and 2) the choice of the step size of the 
random walk. Parameter a is typically in the range of 0.25 to 0.7. The choice of the 
right step size is also important. Simulations suggest that the ratio of the step size to 
its length scale can typically be around 0.001 to 0.01 for most applications. 

Another important issue is the selection of the best and/or elitism, as we intend 
to discard the worst solution and replace it by generating new solution. This may 
implicitly weed out the least-fit solutions, while the solution with the highest fitness 
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remains in the population. The selection of the best and elitism can be guaranteed 
implicitly in the evolutionary walkers. 

Furthermore, the number («) of random walkers is also important. Too few walk- 
ers are not efficient, while too many may lead to slow convergence. In general, the 
choice of n should follow the similar guidelines as those for all population-based 
algorithms. Typically, we can use n— 15 to 50 for most applications. 

2. 6.1 To Be Inspired or Not to Be Inspired 

We have seen that nature-inspired algorithms are always based on a particular (often 
most successful) mechanism of the natural world. Nature has evolved over billions 
of years, she has found almost perfect solutions to every problem she has met. Al- 
most all the not-so-good solutions have been discarded via natural selection. The 
optimal solutions seem (often after a huge number of generations) to appear at the 
evolutionarilly stable equilibrium, even though we may not understand how the per- 
fect solutions are reached. When we try to solve engineering problems, why not 
try to be inspired by nature's success? The simple answer to the question 'To be 
inspired or not to be inspired?' is 'why not?'. If we do not have good solutions at 
hand, it is always a good idea to learn from nature. 

Nature provides almost unlimited ways for problem-solving. If we can observe 
carefully, we are surely inspired to develop more powerful and efficient new gen- 
eration algorithms. Intelligence is a product of biological evolution in nature. Ul- 
timately some intelligent algorithms (or systems) may appear in the future, so that 
they can evolve and optimally adapt to solve NP-hard optimization problems effi- 
ciently and intelligently. 
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Chapter 3 
Surrogate-Based Methods 

Slawomir Koziel, David Echeverria Ciaurri, and Leifur Leifsson 



Abstract. Objective functions that appear in engineering practice may come from 
measurements of physical systems and, more often, from computer simulations. In 
many cases, optimization of such objectives in a straightforward way, i.e., by ap- 
plying optimization routines directly to these functions, is impractical. One reason 
is that simulation-based objective functions are often analytically intractable (dis- 
continuous, non-differentiable, and inherently noisy). Also, sensitivity information 
is usually unavailable, or too expensive to compute. Another, and in many cases 
even more important, reason is the high computational cost of measure- 
ment/simulations. Simulation times of several hours, days or even weeks per ob- 
jective function evaluation are not uncommon in contemporary engineering, de- 
spite the increase of available computing power. Feasible handling of these 
unmanageable functions can be accomplished using surrogate models: the optimi- 
zation of the original objective is replaced by iterative re-optimization and updat- 
ing of the analytically tractable and computationally cheap surrogate. This chapter 
briefly describes the basics of surrogate-based optimization, various ways of creat- 
ing surrogate models, as well as several examples of surrogate-based optimization 
techniques. 

Keywords: Surrogate-based optimization, multi-fidelity optimization, surrogate 
models, simulation-driven design, trust-region methods, function approximation, 
design of experiments. 
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3.1 Introduction 

Contemporary engineering is more and more dependent on computer-aided design 
(CAD). In most engineering fields, numerical simulations are used extensively, 
not only for design verification but also directly in the design process. As a matter 
of fact, because of increasing system complexity, ready-to-use theoretical (e.g., 
analytical) models are not available in many cases. Thus, simulation-driven design 
and design optimization becomes the only option to meet the specifications 
prescribed, improve the system reliability, or reduce the fabrication cost. 

The simulation-driven design can be formulated as a nonlinear minimization 
problem of the following form 

x*=argmin/(x), (3.1) 

where fix) denotes the objective function to be minimized evaluated at the point 
x e R" (x is the design variable vector). In many engineering problems / is of the 
form/(x) = U(R/x)), where R f e R m denotes the response vector of the system of in- 
terest (in particular, one may have m > n or even m » n [1]), whereas U is a given 
scalar merit function. In particular, U can be defined through a norm that measures 
the distance between R/x) and a target vector y. An optimal design vector is denoted 
by x . In many cases, R f is obtained through computationally expensive computer 
simulations. We will refer to it as a high-fidelity or fine model. To simplify notation, 
/itself will also be referred to as the high-fidelity (fine) model. 

Unfortunately, a direct attempt to solve (3.1) by embedding the simulator di- 
rectly in the optimization loop may be impractical. The underlying simulations can 
be very time-consuming (in some instances, the simulation time can be as long as 
several hours, days or even weeks per single design), and the presence of massive 
computing resources is not always translated in computational speedup. This latter 
fact is due to a growing demand for simulation accuracy, both by including mul- 
tiphysics and second-order effects, and by using finer discretization of the structure 
under consideration. As conventional optimization algorithms (e.g., gradient-based 
schemes with numerical derivatives) require tens, hundreds or even thousands of 
objective function calls per run (that depends on the number of design variables), 
the computational cost of the whole optimization process may not be acceptable. 

Another problem is that objective functions coming from computer simulations 
are often analytically intractable (i.e., discontinuous, non-differentiable, and in- 
herently noisy). Moreover, sensitivity information is frequently unavailable, or too 
expensive to compute. While in some cases it is possible to obtain derivative in- 
formation inexpensively through adjoint sensitivities [2], numerical noise is an 
important issue that can complicate simulation-driven design. We should also 
mention that adjoint-based sensitivities require detailed knowledge of and access 
to the simulator source code, and this is something that cannot be assumed to be 
generally available. 

Surrogate-based optimization (SBO) [1,3,4] has been suggested as an effective 
approach for the design with time-consuming computer models. The basic concept 
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of SBO is that the direct optimization of the computationally expensive model is 
replaced by an iterative process that involves the creation, optimization and updat- 
ing of a fast and analytically tractable surrogate model. The surrogate should be a 
reasonably accurate representation of the high-fidelity model, at least locally. The 
design obtained through optimizing the surrogate model is verified by evaluating 
the high-fidelity model. The high-fidelity model data obtained in this verification 
process is then used to update the surrogate. SBO proceeds in this predictor- 
corrector fashion iteratively until some termination criterion is met. Because most 
of the operations are performed on the surrogate model, SBO reduces the compu- 
tational cost of the optimization process when compared to optimizing the high- 
fidelity model directly, without resorting to any surrogate. 

In this chapter, we review the basics of surrogate-based optimization. We brief- 
ly present various ways of generating surrogate models, and we emphasize on the 
distinction between models based on function approximations of sampled high- 
fidelity model data and models constructed from physically-based low-fidelity 
models. A few selected surrogate-based optimization algorithms including space 
mapping [1,5,6], approximation model management [7], manifold mapping [8], 
and the surrogate-management framework [9], are also discussed. We conclude 
the chapter with some final remarks. 

3.2 Surrogate-Based Optimization 

As mentioned in the introduction, there are several reasons why the straightfor- 
ward optimization of the high-fidelity model may not work or can be impractical. 
These reasons include high computational cost of each model evaluation, numeri- 
cal noise and discontinuities in the cost function. Surrogate-based optimization 
[3,5] aims at alleviating such problems by using an auxiliary model, the surrogate, 
that is preferably fast, amenable to optimization, and yet reasonably accurate. One 
popular approach for constructing surrogate models is through approximations of 
high-fidelity model data obtained by sampling the design space using appropriate 
design of experiments methodologies [3]. Some of these strategies for allocating 
samples [10], generating approximations [3,4,10], as well as validating the surro- 
gates are discussed in Section 3.3. 

The surrogate model optimization yields an approximation of the minimizer as- 
sociated to the high-fidelity model. This approximation has to be verified by eva- 
luating the high-fidelity model at the predicted high-fidelity model minimizer. 
Depending on the result of this verification, the optimization process may be ter- 
minated. Otherwise, the surrogate model is updated using the new available high- 
fidelity model data, and then re-optimized to obtain a new, and hopefully better, 
approximation of the high-fidelity model minimizer. 

The surrogate-based optimization process can be summarized as follows 
(Fig. 3.1): 

1 . Generate the initial surrogate model. 

2. Obtain an approximate solution to (3. 1) by optimizing the surrogate. 
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3. Evaluate the high-fidelity model at the approximate solution computed in 
Step 2. 

4. Update the surrogate model using the new high-fidelity model data. 

5. Stop if the termination condition is satisfied; otherwise go to Step 2. 

The SBO framework can be formulated as an iterative procedure [3,5]: 

x ii+l) - argmin s m (x) ■ (3.2) 

This scheme generates a sequence of points (designs) jc (i) that (hopefully) converge 
to a solution (or a good approximation) of the original design problem (3.1). Each 
jc (,+1) is the optimal design of the surrogate model s ( '\ which is assumed to be a 
computationally cheap and sufficiently reliable representation of the fine model/, 
particularly in the neighborhood of the current design x il) . Under these assump- 
tions, the algorithm (3.2) aims at a sequence of designs to quickly approach x . 
Typically, and for verification purposes, the high-fidelity model is evaluated only 
once per iteration (at every new design x il+l) ). The data obtained from the valida- 
tion is used to update the surrogate model. Because the surrogate model is compu- 
tationally cheap, the optimization cost associated with (3.2) can — in many cases — 
be viewed as negligible, so that the total optimization cost is determined by the 
evaluation of the high-fidelity model. Normally, the number of iterations often 
needed within a surrogate-based optimization algorithm is substantially smaller 
than for any method that optimizes the high-fidelity model directly (e.g., gradient- 
based schemes with numerical derivatives) [5], 

If the surrogate model satisfies zero- and first-order consistency conditions with 
the high-fidelity model (i.e., s (i) (x (i) ) =f(x (i) ) and V/'V°) = Vf[x (i) ) [7]; it should 
be noticed that the verification of the latter requires high-fidelity model sensitivity 
data), and the surrogate-based algorithm is enhanced by, for example, a trust re- 
gion method [11] (see Section 3.4.1), then the sequence of intermediate solutions 
is provably convergent to a local optimizer of the fine model [12] (some standard 
assumptions concerning the smoothness of the functions involved are also neces- 
sary) [13]. Convergence can also be guaranteed if the SBO algorithm is embedded 
within the framework given in [5,14] (space mapping), [13] (manifold mapping) 
or [9] (surrogate management framework). A more detailed description of several 
surrogate-based optimization techniques is given in Section 3.4. 

Space mapping [1,5,6] is an example of a surrogate-based methodology that 
does not normally rely on using sensitivity data or trust region convergence safe- 
guards; however, it requires the surrogate model to be constructed from a physi- 
cally-based coarse model [1]. This usually gives remarkably good performance in 
the sense of the algorithm being able to locate a satisfactory design quickly. Un- 
fortunately space mapping suffers from convergence problems [14] and it is sensi- 
tive to the quality of the coarse model and the specific analytical formulation of 
the surrogate [15,16]. 
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Fig. 3.1 Flowchart of the surrogate-based optimization process. An approximate high-fidelity 
model minimizer is obtained iteratively by optimizing the surrogate model. The high-fidelity 
model is evaluated at each new design for verification purposes. If the termination condition is 
not satisfied, the surrogate model is updated and the search continues. In most cases the high- 
fidelity model is evaluated only once per iteration. The number of iterations needed in SBO is 
often substantially smaller than for conventional (direct) optimization techniques. 



3.3 Surrogate Models 



The surrogate model is a key component of any SBO algorithm. It has to be com- 
putationally cheap, preferably smooth, and, at the same time, reasonably accurate, 
so that it can be used to predict the location of high-fidelity model minimizers. We 
can clearly distinguish between physical and functional surrogate models. 

Physical (or physically-based) surrogates are constructed from an underlying 
low-fidelity (coarse) model. The low-fidelity model is a representation of the sys- 
tem of interest with relaxed accuracy [1]. Coarse models are computationally 
cheaper than high-fidelity models and, in many cases, have better analytical prop- 
erties. The low-fidelity model can be obtained, for example, from the same simu- 
lator as the one used for the high-fidelity model but using a coarse discretization 
[17]. Alternatively, the low-fidelity model can be based on simplified physics 
(e.g., by exploiting simplified equations [1], or by neglecting certain second-order 
effects) [18], or on a significantly different physical description (e.g., lumped pa- 
rameter versus partial differential equation based models [1]). In some cases, low- 
fidelity models can be formulated using analytical or semi-empirical formulas 
[19]. The coarse model can be corrected if additional data from the high-fidelity 
model is available (for example, during the course of the optimization). 

In general, physical surrogate models are: 

• based on particular knowledge about the physical system of interest, 

• dedicated (reuse across different designs is uncommon), 

• more expensive to evaluate and more accurate (in a global sense) than 
functional surrogates. 
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It should be noticed that the evaluation of a physical surrogate may involve, for 
example, the numerical solution of partial differential equations or even actual 
measurements of the physical system. 

The main advantage of physically-based surrogates is that the amount of high- 
fidelity model data necessary for obtaining a given level of accuracy is generally 
substantially smaller than for functional surrogates (physical surrogates inherently 
embed knowledge about the system of interest) [1]. Hence, surrogate-based opti- 
mization algorithms that exploit physically-based surrogate models are usually 
more efficient than those using functional surrogates (in terms of the number of 
high-fidelity model evaluations required to find a satisfactory design) [5]. 

Functional (or approximation) surrogate models [20,4]: 

• can be constructed without previous knowledge of the physical system of 
interest, 

• are generic, and therefore applicable to a wide class of problems, 

• are based on (usually simple) algebraic models, 

• are often very cheap to evaluate but require considerable amount of data to 
ensure reasonable general accuracy. 

An initial functional surrogate can be generated using high-fidelity model data ob- 
tained through sampling of the design space. Figure 3.2 shows the model construc- 
tion flowchart for a functional surrogate. Design of experiments involves the use 
of strategies for allocating samples within the design space. The particular choice 
depends on the number of samples one can afford (in some occasions only a few 
points may be allowed), but also on the specific modeling technique that will be 
used to create the surrogate. Though in some cases the surrogate can be found us- 
ing explicit formulas (e.g., polynomial approximation) [3], in most situations it is 
computed by means of a separate minimization problem (e.g., when using kriging 
[21] or neural networks [22]). The accuracy of the model should be tested in order 
to estimate its prediction/generalization capability. The main difficulty in obtain- 
ing a good functional surrogate lies in keeping a balance between accuracy at the 
known and at the unknown data (training and testing set, respectively). The surro- 
gate could be subsequently updated using new high-fidelity model data that is ac- 
cumulated during the run of the surrogate-based optimization algorithm. 

In this section we first describe the fundamental steps for generating functional 
surrogates. Various sampling techniques are presented in Section 3.3.1. The surro- 
gate creation and the model validations steps are tackled in Section 3.3.2 and Sec- 
tion 3.3.3, respectively. If the quality of the surrogate is not sufficient, more data 
points can be added, and/or the model parameters can be updated to improve accu- 
racy. Several correction methods, both for functional and physical surrogates, are 
described in Section 3.3.4. 
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Fig. 3.2 Surrogate model construction flowchart. If the quality of the model is not satisfac- 
tory, the procedure can be iterated (more data points will be required). 



3.3.1 Design of Experiments 



Design of experiments (DOE) [23,24,25] is a strategy for allocating samples 
(points) in the design space that aims at maximizing the amount of information 
acquired. The high-fidelity model is evaluated at these points to create the training 
data set that is subsequently used to construct the functional surrogate model. 
When sampling, there is a clear trade-off between the number of points used and 
the amount of information that can be extracted from these points. The samples 
are typically spread apart as much as possible in order to capture global trends in 
the design space. 

Factorial designs [23] are classical DOE techniques that, when applied to dis- 
crete design variables, explore a large region of the search space. The sampling of 
all possible combinations is called full factorial design. Fractional factorial de- 
signs can be used when model evaluation is expensive and the number of design 
variables is large (in full factorial design the number of samples increases expo- 
nentially with the number of design variables). Continuous variables, once discre- 
tized, can be easily analyzed through factorial design. Full factorial two-level and 
three-level design (also known as 2 k and 3* design) allows us to estimate main ef- 
fects and interactions between design variables, and quadratic effects and interac- 
tions, respectively. Figures 3.3(a) and 3.3(b) show examples of full two-level and 
fractional two-level design, respectively, for three design variables (i.e., n = 3). 
Alternative factorial designs can be found in practice: central composite design 
(see Figure 3.3(c)), star design (frequently used in combination with space map- 
ping [26]; see Figure 3.3(d)), or Box-Behnken design (see Figure 3.3(e)). 
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(a) 



(b) 



(c) 



(d) 



(c) 



Fig. 3.3 Factorial designs for three design variables (n = 3): (a) full factorial design, (b) 
fractional factorial design, (c) central composite design, (d) star design, and (e) Box- 
Behnken design. 

If no prior knowledge about the objective function is available (typical while 
constructing the initial surrogate), some recent DOE approaches tend to allocate 
the samples uniformly within the design space [3]. A variety of space filling de- 
signs are available. The simplest ones do not ensure sufficient uniformity (e.g., 
pseudo-random sampling [23]) or are not practical (e.g., stratified random sam- 
pling, where the number of samples needed is on the order of 2"). One of the most 
popular DOE for (relatively) uniform sample distributions is Latin hypercube 
sampling (LHS) [27]. In order to allocate p samples with LHS, the range for each 
parameter is divided into p bins, which for n design variables, yields a total num- 
ber of/?" bins in the design space. The samples are randomly selected in the design 
space so that (i) each sample is randomly placed inside a bin, and (ii) for all one- 
dimensional projections of the p samples and bins, there is exactly one sample in 
each bin. Figure 3.4 shows a LHS realization of 15 samples for two design vari- 
ables (« = 2), It should be noted that the standard LHS may lead to non-uniform 
distributions (for example, samples allocated along the design space diagonal sat- 
isfy conditions (i) and (ii)). Numerous improvements of standard LHS, e.g., [28]- 
[31], provide more uniform sampling distributions. 

Other DOE methodologies commonly used include orthogonal array sampling 
[3], quasi-Monte Carlo sampling [23], or Hammersley sampling [23]. Sample dis- 
tribution can be improved through the incorporation of optimization techniques 
that minimize a specific non-uniformity measure, e.g., V p V'' j~ 2 [29], where 

dij is the Euclidean distance between samples i and/'. 



Fig. 3.4 Latin hypercube sampling realization of 15 samples in a two-dimensional design 
space. 
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3.3.2 Surrogate Modeling Techniques 

Having selected the design of experiments technique and sampled the data, the 
next step is to choose an approximation model and a fitting methodology. In this 
section, we describe in some detail the most popular surrogate modeling tech- 
niques, and we briefly mention alternatives. 

3.3.2.1 Polynomial Regression 

Polynomial regression [3] assumes the following relation between the function of 
interest/and K polynomial basis functions v, using p samples f(x u> ), i = 1, ... ,p: 

H 

These equations can be represented in matrix form 

f=Xj3, (3.4) 

where/ = \fix m ) fix (2 ^) . . . f(x (p) )] T , X is a pxK matrix containing the basis func- 
tions evaluated at the sample points, and /3 = [J3j 02 ■■■ Pk\' T - The number of 
sample points p should be consistent with the number of basis functions consid- 
ered K (typically p > K). If the sample points and basis function are taken arbitrar- 
ily, some columns of X can be linearly dependent. If p > K and rank(X) = K, a so- 
lution of (3.4) in the least-squares sense can be computed through X + , the 
pseudoinverse (or generalized inverse) of X [32]: 

p=x + = (X T X) A X T f. (3.5) 

The simplest examples of regression models are the first- and second-order order 
polynomial models 



s(x) = s([ Xl x 2 ... xj)=j3 +Y J P 1 X J ' (3 - 6) 

six) = silx, x 2 ... xj) = j3 +f d J3 j x J +t l t l fi 9 x i x j ■ < 3 - 7 ) 

H 1=1 j*i 

Polynomial interpolation/regression appears naturally and is crucial in developing 
robust and efficient derivative-free optimization algorithms. For more details, 
please refer to [33]. 

3.3.2.2 Radial Basis Functions 

Radial basis function interpolation/approximation [4,34] exploits linear combina- 
tions of K radially symmetric functions (fi 

K 

.s(;t) = £/l/(IIJt-c 0) ll)> ( 3 - 8 ) 

y=i 
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where X = [X\ Xl ... Xk] t is the vector of model parameters, and c w , j = 1, ... , K, 
are the (known) basis function centers. 

As in polynomial regression the model parameters X can be computed by 



A = + =(0 T 0) 1 T f, 



(3.9) 



where again/= \fix (l) ) fix (2 ^) ... f(x {p> )] T , and the pxK matrix is defined as 



= 



0(\\ X ll> -C m II) 


0(\\x m -C i2) \\) ■ 


• 0(ll* (1) -c m ll)~ 


0(11 * (2) -c (1) II) 


0(llx <2) -c <2) ll) • 


• 0{\\x (1) -c iK) \\) 


(Z*(ll JC (P) -c (1) II) 


0(lljt (p) -c (2) ll) • 


■ <t>{\\x {p) -c [K) \\)_ 



(3.10) 



If we select p = K (i.e., the number of basis functions is equal to the number of 
samples), and if the centers of the basis functions coincide with the data points 
(and these are all different), 5>is a regular square matrix (and thus, X = <P~ f). 

Typical choices for the basis functions are <p(r) = r, <p(r) = r 3 , or 0(r) = r 2 lnr 
(thin plate spline). More flexibility can be obtained by using parametric basis 
functions such as tp(r) = exp(-r 2 /2o 2 ) (Gaussian), (p(r) = (r 2 + o 2 ) 1 ' 2 (multi- 
quadric), or 0(r) = (r 2 + o 2 ) - " 2 (inverse multi-quadric). 



3.3.2.3 Kriging 

Kriging is a popular technique to interpolate deterministic noise-free data 
[35,10,21,36]. Kriging is a Gaussian process [37] based modeling method, which 
is compact and cheap to evaluate. Kriging has been proven to be useful in a wide 
variety of fields (see, e.g., [4,38] for applications in optimization). 

In its basic formulation, kriging [35,10] assumes that the function of interest/is 
of the form 



f(x) = g(x) T fi + Z(x), 



(3.11) 



where g(x) = [gi(x) gi(x) ... gidx)] 7 are known (e.g., constant) functions, 
P= [P\ fh ■■■ Pk\ T are me unknown model parameters, and Z(x) is a realization of 
a normally distributed Gaussian random process with zero mean and variance o 2 . 

The regression part g(x) T /3 approximates globally the function /, and Z(x) takes 
into account localized variations. The covariance matrix of Z(x) is given as 



Cov[Z(x {, ')Z(x u, )] = (t'R([R(x ,, ',x ,j ')]) 



(3.12) 



where R is a pxp correlation matrix with Ry = /?(x (,) ,jc w ). Here, R(x ( '\ jc w ) is the 
correlation function between sampled data points x (,) and jc w . The most popular 
choice is the Gaussian correlation function 



fl(*,30 = exp[-X^I* t -y,l 2 ] 



(3.13) 



where k are unknown correlation parameters, and x k and y k are the k component 
of the vectors x and y, respectively. 
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The kriging predictor [10,35] is defined as 

s (x) = g(x) T fi + r T (x)R- 1 (f-Gfl), (3.14) 

where r(x) = [R(x,x (l) ) ... R(x,x {p) )f, f= \f(x ([) ) f(x {2] ) ... f{x (p) )] T , and G is a 
pxK matrix with G,y = gj(x u> ). 

The vector of model parameters /? can be computed as 

/l = (G T R- l Gr'G T R l f- (3.15) 

An estimate of the variance o 2 is given by 

a 2 =-(f-Gfi) T R l (f-Gfi)- ( 316 ) 

P 

Model fitting is accomplished by maximum likelihood for k [35]. In particular, the 
w-dimensional unconstrained nonlinear maximization problem with cost function 

-(/?ln(<j 2 ) + lnl/?l)/2, (3.17) 

where the variance o 2 and \R\ are both functions of k , is solved for positive values 
of k as optimization variables. 

It should be noted that, once the kriging-based surrogate has been obtained, the 
random process Z(x) gives valuable information regarding the approximation error 
that can be used for improving the surrogate [4,35]. 

3.3.2.4 Neural Networks 

The basic structure in a neural network [39,40] is the neuron (or single-unit per- 
ceptron). A neuron performs an affine transformation followed by a nonlinear op- 
eration (see Fig. 3.5(a)). If the inputs to a neuron are denoted as x\, ..., x n , the neu- 
ron output v is computed as 



1 



l + exp(-77/r) 



(3.18) 



where t] = \v\X\ + ... + w n x n + y, with W\, ..., w n being regression coefficients. 
Here, yis the bias value of a neuron, and T is a user-defined (slope) parameter. 
Neurons can be combined in multiple ways [39]. The most common neural net- 
work architecture is the multi-layer feed-forward network (see Fig. 3.5(b)). 

The construction of a functional surrogate based on a neural network requires 
two main steps: (i) architecture selection, and (ii) network training. The network 
training can be stated as a nonlinear least-squares regression problem for a number 
of training points. Since the optimization cost function is nonlinear in all the opti- 
mization variables (neurons coefficients), the solution cannot be written using a 
closed-form expression, as it was the case before in (3.5) or in (3.9). A very popular 
technique for solving this regression problem is the error back-propagation algo- 
rithm [10,39]. If the network architecture is sufficiently complex, a neural network 
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can approximate a general set of functions [10]. However, in complicated cases 
(e.g., nonsmooth functions with a large number of variables) the underlying re- 
gression problem may be significantly involved. 

Input Hidden Oinput 

Unite Ij^et UniCs 





(a) (b) 

Fig. 3.5. Neural networks: (a) neuron basic structure; (b) two-layer feed-forward neural net- 
work architecture. 

3.3.2.5 Other Techniques 

The techniques described in this section refer to some other approaches that are 
gaining popularity recently. One of the most prominent approaches, which has been 
observed as a very general approximation tool, is support vector regression (SVR) 
[41,42]. SVR resorts to quadratic programming for a robust solving of the underly- 
ing optimization in the approximation procedure [43]. SVR is a variant of the sup- 
port vector machines (SVMs) methodology developed by Vapnik [44], which was 
originally applied to classification problems. SVR/SVM exploits the structural risk 
minimization (SRM) principle, which has been shown (see, e.g., [41]) to be supe- 
rior to the traditional empirical risk minimization (ERM) principle employed by 
several modeling technologies (e.g., neural networks). ERM is based on minimiz- 
ing an error function for the set of training points. When the model structure is 
complex (e.g., higher order polynomials), ERM-based surrogates often result in 
overfitting. SRM incorporates the model complexity in the regression, and there- 
fore yields surrogates that may be more accurate outside of the training set. 

Moving least squares (MLS) [45] is a technique particularly popular in aero- 
space engineering. MLS is formulated as weighted least squares (WLS) [46]. In 
MLS, the error contribution from each training point x M is multiplied by a weight 
COj that depends on the distance between x and x ( '\ A common choice for the 
weights is 

fl$(ll*-x (i) ll) = exp(-ll;r-x (0 ll 2 ) ■ (3.19) 

MLS is essentially an adapting surrogate, and this additional flexibility can be 
translated in more appealing designs (especially in computer graphics applications). 
However, MLS is computationally more expensive than WLS, since computing the 
approximation for each point x requires solving a new optimization problem. 
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Gaussian process regression (GPR) [47] is a surrogate modeling technique that, as 
kriging, addresses the approximation problem from a stochastic point view. From this 
perspective, and since Gaussian processes are mathematically tractable, it is relatively 
easy to compute error estimations for GPR-based surrogates in the form of uncer- 
tainty distributions. Under appropriate conditions, Gaussian processes can be shown 
to be equivalent to large neural networks [47]. Nevertheless, Gaussian process 
modeling typically requires much less regression parameters than neural networks. 

3.3.3 Model Validation 

Some of the methodologies described above determine a surrogate model together 
with some estimation of the attendant approximation error (e.g., kriging or Gaus- 
sian process regression). Alternatively, there are procedures that can be used in a 
stand-alone manner to validate the prediction capability of a given model beyond 
the set of training points. A simple way for validating a model is the split-sample 
method [3]. In this algorithm, the set of available data samples is divided into two 
subsets. The first subset is called the training subset and contains the points consid- 
ered for the construction of the surrogate. The second subset is the testing subset 
and serves purely as a model validation objective. In general, the error estimated by 
a split-sample method depends strongly on how the set of data samples is parti- 
tioned. We also note that in this approach the samples available do not appear to be 
put to good use, since the surrogate is based on only a subset of them. 

Cross-validation [3,48] is an extremely popular methodology for verifying the 
prediction capabilities of a model generated from a set of samples. In cross- 
validation the data set is divided into L subsets, and each of these subsets is se- 
quentially used as testing set for a surrogate constructed on the other L-l subsets. 
If the number of subsets L is equal to the sample size p, the approach is called 
leave-one-out cross-validation [3]. The prediction error can be estimated with all 
the L error measures obtained in this process (for example, as an average value). 
Cross-validation provides an error estimation that is less biased than with the split- 
sample method [3]. The disadvantage of this method is that the surrogate has to be 
constructed more than once. However, having multiple approximations may im- 
prove the robustness of the whole surrogate generation and validation approach, 
since all the data available is used with both training and testing purposes. 

3.3.4 Surrogate Correction 

In the first stages on any surrogate-based optimization procedure, it is desirable to 
use a surrogate that is valid globally in the search space [4] in order to avoid being 
trapped in local solutions with unacceptable cost function values. Once the search 
starts becoming local, the global accuracy of the initial surrogate may not be bene- 
ficial for making progress in the optimization 1 . For this reason, surrogate correc- 
tion is crucial within any SBO methodology. 



1 As mentioned in Section 3.2, when solving the original optimization problem in (3.1) 
using a surrogate-based optimization framework, zero- and first-order local consistency 
conditions are essential for obtaining convergence to a first-order stationary point. 
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In this section we will describe two strategies for improving surrogates locally. 
The corrections described in Section 3.3.4.1 are based on mapping objective func- 
tion values. In some occasions, the cost function can be expressed as a function of 
a model response. Section 3.3.4.2 presents the space-mapping concept that gives 
rise to a whole surrogate-based optimization paradigm (see Section 3.4.2). 

3.3.4.1 Objective Function Correction 

Most of the objective function corrections used in practice can be identified in one 
of these three groups: compositional, additive or multiplicative corrections. We will 
briefly illustrate each of these categories for correcting the surrogate s (x), and dis- 
cuss if zero- and first-order consistency conditions with/ix) [7] can be satisfied. 
The following compositional correction [20] 

s m) (x) = g(/'\x)) (3.20) 

represents a simple scaling of the objective function. Since the mapping g is a real- 
valued function of a real variable, a compositional correction will not in general 
yield first-order consistency conditions. By selecting a mapping g that satisfies 

. W )) S 1^V1- 1 (3.21) 

the discrepancy between Vfix (,) ) and Vs ( ' +1> (.*: ( ' ) ) (expressed in Euclidean norm) is 
minimized. It should be noticed that the correction in (3.21), as many transforma- 
tions that ensure first-order consistency, requires a high-fidelity gradient, which 
may be expensive to compute. However, numerical estimates of V/(jc (,) ) may yield 
in practice acceptable results. 

The compositional correction can be also introduced in the parameter space [ 1 ] 

/ +1) (*) = /> (/>(*))• (3-22) 

If /(jc (i) ) is not in the range of s (x), then the condition s (,) (p(x (,) )) = fix ) is not 
achievable. We can overcome that issue by combining both compositional correc- 
tions. In that case, the following selection for g and/? 

g(t) = t-s {,) (x {0 ) + f(x u) ), (3.23) 

p(x) = x' r) +J p (x-x {,} ), (3.24) 

where J p is a nxn matrix for which J p T Vs ( ' ) ='Vfix ( ' ) ), guarantees consistency. 

Additive and multiplicative corrections allow obtaining first-order consistency 
conditions. For the additive case we can generally express the correction as 

s iM) (x) = A(x) + s (,) (x). (3.25) 

The associated consistency conditions require that Mx) satisfies 

A(x w ) = f(x (i) )-/°(x w ), (3.26) 

and 
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VA(x {,) )=Vf(x {0 )-Vs {,) (x (,) ) ■ (3.27) 

Those requirements can be obtained by the following linear additive correction: 

s m \x) = f(x m )-s (i \x w ) + (Vf(x m )-Vs m (x w ))(x-x w ) + s m (x)- (3.28) 

Multiplicative corrections (also known as the /^-correlation method [20]) can be 
represented generically by 

s iM) (x) = a(x)s (i) (x)- (3.29) 

Assuming that s u \x (,) ) * 0, zero- and first-order consistency can be achieved if 

a(x in )= I(^), (3.30) 

and 

Va(x (i) ) = [Vf(x (i) )-f(x (i) )/s ( ' ) (x ( ' ) )Vs (i} (x (i) )]/s ( ' ) (x ( ' ) )- (3.31) 

The requirement s (x ) ^ is not strong in practice since very often the range of 
fix) (and thus, of the surrogate s (x)) is known beforehand, and hence, a bias can 
be introduced both for fix) and s ( '\x) to avoid cost function values equal to zero. 
In these circumstances the following multiplicative correction 



/■• +1 >(*) = 



/(* w ) , Vf (x (i) ) S {0 ( j w ) - / (x (i) ) W> (x (i) ) xi0 



s w (x (n ) (/°(x w )f 



s {, \x)- 



(3.32) 



is consistent with conditions (3.30) and (3.31). 

3.3.4.2 Space Mapping Concept 

Space mapping (SM) [1,5,6] is a well-known methodology for correcting a given 
(either functional or physical) surrogate. SM algorithms aim at objective functions 
fix) that can be written as a functional U of a so-called system response Rj(x)e R m 

f(x) = U(R f (x)). (3.33) 

The fine model response Rj(x) is assumed to be accurate but computationally ex- 
pensive. The coarse model response R c (x)eR m is much cheaper to evaluate than 
the fine model response at the expense of being an approximation of it. SM estab- 
lishes a correction between model responses rather than between objective func- 
tions. The corrected model response will be denoted as R s (x; Psm) e R m , and p SM 
represents a set of parameters that describes the type of correction performed. 

We can find in the literature four different groups of coarse model response 
corrections [1,5]: 



48 S. Koziel, D. Echeverna Ciaurri, and L. Leifsson 

1. Input space mapping [1]. The response correction is based on an affine 
transformation on the low-fidelity model parameter space. Example: 
R s (x;psm) = R s (x; B,c) = R C (B x + c). 

2. Output space mapping [5]. The response correction is based on an affine 
transformation on the low-fidelity model response. Example: 
R s (x\Psm) = R s (x', A,d) = A R c (x)+d. Manifold-mapping (see Section 3.4.3) 
is a particular case of output space mapping. 

3. Implicit space mapping [49]. In some cases, there are additional parameters 
x p eR" 1 ' in the coarse model response R c (x; x p ) that can be tuned for better 
aligning of the fine and coarse model responses. Example: 
R s (x;psm) = R s (x', x p ) = R c (x; x p ). These additional parameters are known in 
SM lexicon as pre-assigned parameters, and are in general different from 
the optimization variables jc. 

4. Custom corrections that exploit the structure of the given design prob- 
lem [1]. In many occasions the model responses are obtained through the 
sweeping of some parameter t : 

R f (x) = R f (x;t) = [R f (x;t l ) R f (x;t 2 ) - R f (x;tjf , (3.34) 

R e (x) = R e (x;t) = [R e (x;t 1 ) R c (x;t 2 ) ... R c (x;tjf • (3-35) 

Examples of this situation appear when the parameter t represents time or 
frequency. The response correction considered in this case 2 could be based 
on an affine transformation on the sweeping parameter space: 

R,(x;p ai ) = R,(x;r ,r 1 ) = R e (x;r +r 1 t). (3-36) 

In Fig. 3.6 we illustrate by means of block diagrams the four SM-based correction 
strategies introduced above, together with a combination of three of them. 

The surrogate response is usually optimized with respect to the SM parame- 
ters p SM in order to reduce the model discrepancy for all or part of the data avail- 
able R f (x { \ Rj(x (2 \ ... ,R f (x {p> ): 

p SM =argminX;_ 1 */ t) \\R f (x m )-R,(x m ; Pat )\\> (3-37) 

PSM 

where < (C) ' < 1 are weights for each of the samples. The corrected surrogate 
R s (x; psu) can be used as an approximation to the fine response R/x) in the vicin- 
ity of the sampled data. The minimization in (3.37) is known in SM literature as 
parameter extraction [1]. The solving of this optimization process is not exempt 
from difficulties, since in many cases the problem is ill-conditioned. We can find 
in [1] a number of techniques for addressing parameter extraction in a robust 
manner. 



2 This type of space mapping is known as frequency space mapping [4], and it was origi- 
nally proposed in microwave engineering applications (in these applications t usually 
refers to frequency). 
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Fig. 3.6 Basic space-mapping surrogate correction types: (a) input SM, (b) output SM, (c) 
implicit SM, (d) frequency SM, and (e) composite using input, output and frequency SM. 



3.4 Surrogate-Based Optimization Techniques 



In this section, we will introduce several optimization strategies that exploit surro- 
gate models. More specifically, we will describe approximation model manage- 
ment optimization [7], space mapping [5], manifold mapping [8], and the 
surrogate management framework [9]. The first three approaches follow the sur- 
rogate-based optimization framework presented in Section 3.2. We will conclude 
the section with a brief discussion on addressing the tradeoff between exploration 
and/or exploitation in the optimization process. 
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3.4.1 Approximation Model Management Optimization 

Approximation model management optimization (AMMO) [7] relies on trust- 
region gradient-based optimization combined with the multiplicative linear surro- 
gate correction (3.32) introduced in Section 3.3.4.1. 

The basic AMMO algorithm can be summarized as follows: 

1 . Set initial guess jc (0) , 5 <0) (x), and i = 0, and select the initial trust-region ra- 
dius S> 0. 

2. If i > 0, then s (i \x) = ctx) / M) (jc). 

3. Solve h* = argmin s (i \x (i) + h) subject to \\h\L < 8. 

4. Calculate p = (fix®) -fix® + ft*))/(s(jc (0 ) - s®(x® + h*)). 

5. If fix®) > fix® + h\ then set x {i+l) = x® + h*; otherwise x (M) = x®. 

6. Update the search radius Abased on the value of p. 

7. Set i = i + 1, and if the termination condition is not satisfied, go to Step 2. 

Additional constraints can also be incorporated in the optimization through Step 3. 
AMMO can also be extended to cases where the constraints are expensive to 
evaluate and can be approximated by surrogates [50]. The search radius £is up- 
dated using the standard trust-region rules [11,51]. We reiterate that the surrogate 
correction considered yields zero- and first-order consistency with/(jc). Since this 
surrogate-based approach is safeguarded by means of a trust-region method, the 
whole scheme can be proven to be globally convergent to a first-order stationary 
point of the original optimization problem (3.1). 



3.4.2 Space Mapping 

The space mapping (SM) paradigm [1,5] was originally developed in microwave 
engineering optimal design applications, and gave rise to an entire family of sur- 
rogate-based optimization approaches. Nowadays, its popularity is spreading 
across several engineering disciplines [52,53,1]. The initial space-mapping opti- 
mization methodologies were based on input SM [1], i.e., a linear correction of the 
coarse model design space. This kind of correction is well suited for many engi- 
neering problems, particularly in electrical engineering, where the model discrep- 
ancy is mostly due to second-order effects (e.g., presence of parasitic compo- 
nents). In these applications the model response ranges are often similar in shape, 
but slightly distorted and/or shifted with respect to a sweeping parameter 
(e.g., signal frequency). 

Space mapping can be incorporated in the SBO framework by just identifying 
the sequence of surrogates with 



s 



(0) 



(x)=U(R c (x)), (3.38) 



and 

s<'\x)=U(R s (x;p^y), (3.39) 
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for i > 0. The parameters Psm are obtained by parameter extraction as in (3.37). 
The accuracy of the corrected surrogate will clearly depend on the quality of the 
coarse model response [16]. In microwave design applications it has been many 
times observed that the number of points p needed for obtaining a satisfactory SM- 
based corrected surrogate is on the order of the number of optimization variables n 
[1]. Though output SM can be used to obtain both zero- and first-order consistency 
conditions with/(x), many other SM-based optimization algorithms that have been 
applied in practice do not satisfy those conditions, and in some occasions conver- 
gence problems have been identified [14]. Additionally, the choice of an adequate 
SM correction approach is not always obvious [14]. However, in multiple occa- 
sions and in several different disciplines [52,53,1], space mapping has been re- 
ported as a very efficient means for obtaining satisfactory optimal designs. 

Convergence properties of space-mapping optimization algorithms can be 
improved when these are safeguarded by a trust region [54]. Similarly to AMMO, 
the SM surrogate model optimization is restricted to a neighborhood of x {,) (this 
time by using the Euclidean norm) as follows 

x iM] =<xgrmns w {x) subject to II x-x w \\ 2 <8 {,) * (3-40) 

X 

where 3 ,) denotes the trust-region radius at iteration i. The trust region is updated 
at every iteration by means of precise criteria [11]. A number of enhancements for 
space mapping have been suggested recently in the literature (e.g., zero-order and 
aproximate/exact first order consistency conditions with/(jc) [54], or adaptively 
constrained parameter extraction [55]). 

The quality of a surrogate within space mapping can be assessed by means of 
the techniques described in [14,16]. These methods are based on evaluating the 
high-fidelity model at several points (and thus, they require some extra 
computational effort). With that information, some conditions required for 
convergence are approximated numerically, and as a result, low-fidelity models 
can be compared based on these approximate conditions. The quality assessment 
algorithms presented in [14,16] can also be embedded into SM optimization 
algorithms in order to throw some light on the delicate issue of selecting the most 
adequate SM surrogate correction. 

It should be emphasized that space mapping is not a general-purpose 
optimization approach. The existence of the computationally cheap and 
sufficiently accurate low-fidelity model is an important prerequisite for this 
technique. If such a coarse model does exist, satisfactory designs are often 
obtained by space mapping after a relatively small number of evaluations of the 
high-fidelity model. This number is usually on the order of the number of 
optimization variables n [14], and very frequently represents a dramatic reduction 
in the computational cost required for solving the same optimization problem with 
other methods that do not rely on surrogates. In the absence of the above- 
mentioned low-fidelity model, space-mapping optimization algorithms may not 
perform efficiently. 
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3.4.3 Manifold Mapping 

Manifold mapping (MM) [8,56] is a particular case of output space mapping, that 
is supported by convergence theory [13,56], and does not require the parameter 
extraction step shown in (3.37). Manifold mapping can be integrated in the SBO 
framework by just considering s <l \x) = U(R s <l> (x)) with the response correction for 
i > defined as 

Rl'\x) = R f (x m ) + S m (R c (x)-R c (x m )), (3-41) 

where S , for i > 1 , is the following mxm matrix 

S (,) =AFAC', (3.42) 

with 

AF=[R f (x i ' ) )-R f (x (i - l) ) ... R f (x {i) )-R f (x (m " li -"- 0]} )], (3.43) 

AC = [R c (x <,) )-R c (x < - l) ) ... R c (x i,) )-R c (x <mi " il -" m )]- (3.44) 

The matrix S <0> is typically taken as the identity matrix I m . Here, ' denotes the 
pseudoinverse operator defined for AC as 

AC*=V AC ^Xc (3-45) 

where U A c, Zao and Vac are the factors in the singular value decomposition of 
AC. The matrix £ AC + is the result of inverting the nonzero entries in £ac> leaving 
the zeroes invariant [8]. Some mild general assumptions on the model responses 
are made in theory [56] so that every pseudoinverse introduced is well defined. 
The response correction R s (x) is an approximation of 

Rl(x) = R f (x') + S'(R c (x)-R c (x')), (3.46) 

with S being the mxm matrix defined as 

S*=7 / (x*)/j(x*), (3.47) 

where Jj (x ) and J c (x ) stand for the fine and coarse model response Jacobian, re- 
spectively, evaluated at x . Obviously, neither x nor S is known beforehand. 
Therefore, one needs to use an iterative approximation, such as the one in (3.41)- 
(3.45), in the actual manifold-mapping algorithm. 

The manifold-mapping model alignment is illustrated in Fig. 3.7 for the least- 
squares optimization problem 

U(R f (x))=\\R f (x)-y\\ 2 2 , (3.48) 

with y e R m being the design specifications given. In that figure the point x c de- 
notes the minimizer corresponding to the coarse model cost function U(R c {x)). We 
note that, in absence of constraints, the optimality associated to (3.48) is translated 
into the orthogonality between the tangent plane for Rf (x) at jc and the vector 
Rj(x*)-y. 
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If the low-fidelity model has a negligible computational cost when compared to 
the high-fidelity one, the MM surrogate can be explored globally. The MM algo- 
rithm is in this case endowed with some robustness with respect to being trapped 
in unsatisfactory local minima. 

For least-squares optimization problems as in (3.48), manifold mapping is sup- 
ported by mathematically sound convergence theory [13]. We can identify four 
factors relevant for the convergence of the scheme above to the fine model opti- 
mizer jc : 

1 . The model responses being smooth. 

2. The surrogate optimization in (3.2) being well-posed. 

3. The discrepancy of the optimal model response Rj(x ) with respect to the 
design specifications being sufficiently small. 

4. The low-fidelity model response being a sufficiently good approximation 
of the high-fidelity model response. 

In most practical situations the requirements associated to the first three factors are 
satisfied, and since the low-fidelity models often considered are based on expert 
knowledge accumulated over the years, the similarity between the model re- 
sponses can be frequently good enough for having convergence. 

Manifold-mapping algorithms can be expected to converge for a merit function 
U sufficiently smooth. Since the correction in (3.41) does not involve U, if the 
model responses are smooth enough, and even when U is not differentiable, mani- 
fold mapping may still yield satisfactory solutions. The experimental evidence 
given in [57] for designs based on minimax objective functions indicates that the 
MM approach can be used successfully in more general situations than those for 
which theoretical results have been obtained. 

The basic manifold-mapping algorithm can be modified in a number of ways. 
Convergence appears to improve if derivative information is introduced in the al- 
gorithm [13]. The incorporation of a Levenberg-Marquardt strategy in manifold 
mapping [58] can be seen as a convergence safeguard analogous to a trust-region 
method [11]. Manifold mapping can also be extended to designs where the con- 
straints are determined by time-consuming functions, and for which surrogates are 
available as well [59]. 

3.4.4 Surrogate Management Framework 

The surrogate management framework (SMF) [9] is mainly based on pattern 
search. Pattern search [60] is a general set of derivative-free optimizers that can be 
proven to be globally convergent to first-order stationary points. A pattern search 
optimization algorithm is based on exploring the search space by means of a struc- 
tured set of points (pattern or stencil) that is modified along iterations. The pattern 
search scheme considered in [9] has two main steps per iteration: search and poll. 
Each iteration starts with a pattern of size A centered at x '. The search step is op- 
tional and is always performed before the poll step. In the search stage a (small) 
number of points are selected from the search space (typically by means of a sur- 
rogate), and the cost function fix) is evaluated at these points. If the cost function 
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Fig. 3.7 Illustration of the manifold-mapping model alignment for a least-squares optimiza- 
tion problem. The point x c denotes the minimizer corresponding to the coarse model re- 
sponse, and the point y is the vector of design specifications. Thin solid and dashed straight 
lines denote the tangent planes for the fine and coarse model response at their optimal de- 
signs, respectively. By the linear correction S , the point R c (x ) is mapped to Rj{x ), and 
the tangent plane for R c (x) atR c (x ) to the tangent plane f 'or Rj(x) alRf(x ) [13]. 

for some of them improves on/(jc (,) ) the search step is declared successful, the cur- 
rent pattern is centered at this new point, and a new search step is started. Other- 
wise a poll step is taken. Polling requires computing fix) for points in the pattern. 
If one of these points is found to improve on fix (l) ), the poll step is declared suc- 
cessful, the pattern is translated to this new point, and a new search step is per- 
formed. Otherwise the whole pattern search iteration is considered unsuccessful 
and the termination condition is checked. This stopping criterion is typically based 
on the pattern size A [9,61]. If, after the unsuccessful pattern search iteration an- 
other iteration is needed, the pattern size A is decreased, and a new search step is 
taken with the pattern centered again at jc (,) . Surrogates are incorporated in the 
SMF through the search step. For example, kriging (with Latin hypercube sam- 
pling) is considered in the SMF application studied in [61]. 

In order to guarantee convergence to a stationary point, the set of vectors 
formed by each pattern point and the pattern center should be a generating (or 
positive spanning) set [60,61]. A generating set for R" consists of a set of vectors 
whose non-negative linear combinations span R". Generating sets are crucial in 
proving convergence (for smooth objective functions) due to the following prop- 
erty: if a generating set is centered at jc (,) and Vfix ) ^ 0, then at least one of the 
vectors in the generating set defines a descent direction [60]. Therefore, if fix) is 
smooth and Vfix ) ^ 0, we can expect that for a pattern size A small enough, some 
of the points in the associated stencil will improve on fix ). 

Though pattern search optimization algorithms typically require many more 
function evaluations than gradient-based techniques, the computations in both the 
search and poll steps can be performed in a distributed fashion. On top of that, the 
use of surrogates, as is the case for the SMF, generally accelerates noticeably the 
entire optimization process. 
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3.4.5 Exploitation versus Exploration 

The surrogate-based optimization framework starts from an initial surrogate model 
which is updated using the high-fidelity model data that is accumulated in the op- 
timization process. In particular, the high-fidelity model has to be evaluated for 
verification at any new design x® provided by the surrogate model. The new 
points at which we evaluate the high-fidelity model are sometimes referred to as 
infill points [4]. We reiterate that this data can be used to enhance the surrogate. 
The selection of the infill points is also known as adaptive sampling [4]. 

Infill points in approximation model management optimization, space mapping 
and manifold mapping are in practice selected through local optimization of the 
surrogate (global optimization for problems with a medium/large number of vari- 
ables and even relatively inexpensive surrogates can be a time-consuming proce- 
dure). The new infill points in the surrogate management framework are taken 
based only on high-fidelity cost function improvement. As we have seen in this 
section, the four surrogate-based optimization approaches discussed are supported 
by local optimality theoretical results. In other words, these methodologies intrin- 
sically aim at the exploitation of certain region of the design space (the neighbor- 
hood of a first-order stationary point). If the surrogate is valid globally, the first it- 
erations of these four optimization approaches can be used to avoid being trapped 
in unsatisfactory local solutions (i.e., global exploration steps). 

The exploration of the design space implies in most cases a global search. If the 
underlying objective function is non-convex, exploration usually boils down to 
performing a global sampling of the search space, for example, by selecting those 
points that maximize some estimation of the error associated to the surrogate con- 
sidered [4]. It should be stressed that global exploration is often impractical, espe- 
cially for computationally expensive cost functions with a medium/large number 
of optimization variables (more than a few tens). Additionally, pure exploration 
may not be a good approach for updating the surrogate in an optimization context, 
since a great amount of computing resources can be spent in modeling parts of the 
search space that are not interesting from an optimal design point of view. 

Therefore, it appears that in optimization there should be a balance between ex- 
ploitation and exploration. As suggested in [4], this tradeoff could be formulated 
in the context of surrogate-based optimization, for example, by means of a bi- 
objective optimization problem (with global measure of the error associated to the 
surrogate as second objective function), by maximizing the probability of im- 
provement upon the best observed objective function value, or through the maxi- 
mization of the expected cost function improvement. As mentioned above, these 
hybrid approaches will find difficulties in performing an effective global search in 
designs with a medium/large number of optimization variables. 

3.5 Final Remarks 

In this chapter, an overview of surrogate modeling, with an emphasis on optimiza- 
tion, has been presented. Surrogate-based optimization plays an important role in 
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contemporary engineering design, and the importance of this role will most likely 
increase in the near future. One of the reasons for this increase is the fact that 
computer simulations have become a major design tool in most engineering areas. 
In order for these simulations to be sufficiently accurate, more and more phenom- 
ena have to be captured. This level of sophistication renders simulations computa- 
tionally expensive, particularly when they deal with the time-varying three- 
dimensional structures considered in many engineering fields. Hence, evaluation 
times of several days, or even weeks, are nowadays not uncommon. The direct use 
of CPU-intensive numerical models in some off-the-shelf automated optimization 
procedures (e.g., gradient-based techniques with approximate derivatives) is very 
often prohibitive. Surrogate-based optimization can be a very useful approach in 
this context, since, apart from reducing significantly the number of high-fidelity 
expensive simulations in the whole design process, it also helps in addressing im- 
portant high-fidelity cost function issues (e.g., presence of discontinuities and/or 
multiple local optima). 
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Chapter 4 

Derivative-Free Optimization 

Oliver Kramer, David Echeverria Ciaurri, and Slawomir Koziel 



Abstract. In many engineering applications it is common to find optimization 
problems where the cost function and/or constraints require complex simulations. 
Though it is often, but not always, theoretically possible in these cases to ex- 
tract derivative information efficiently, the associated implementation procedures 
are typically non-trivial and time-consuming (e.g., adjoint-based methodologies). 
Derivative-free (non-invasive, black-box) optimization has lately received consid- 
erable attention within the optimization community, including the establishment of 
solid mathematical foundations for many of the methods considered in practice. In 
this chapter we will describe some of the most conspicuous derivative-free optimiza- 
tion techniques. Our depiction will concentrate first on local optimization such as 
pattern search techniques, and other methods based on interpolation/approximation. 
Then, we will survey a number of global search methodologies, and finally give 
guidelines on constraint handling approaches. 



4.1 Introduction 

Efficient optimization very often hinges on the use of derivative information of 
the cost function and/or constraints with respect to the design variables. In the last 
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decades, the computational models used in design have increased in sophistication 
to such an extent that it is common to find situations where (reliable) derivative 
information is not available. Although in simulation-based design there are method- 
ologies that allow one to extract derivatives with a modest amount of additional com- 
putation, these approaches are in general invasive with respect to the simulator (e.g., 
adjoint-based techniques [ 1]), and thus, require precise knowledge of the simulation 
code and access to it. Moreover, obtaining derivatives in this intrusive way often 
implies significant coding (not only at the code development stage, but also sub- 
sequently, when maintaining or upgrading the software), and consequently, many 
simulators simply yield, as output, the data needed for the cost function and/or con- 
straint values. Furthermore, optimal design has currently a clear multidisciplinary 
nature, so it is reasonable to expect that some components of the overall simula- 
tion do not include derivatives. This situation is even more likely when commercial 
software is used, since then the source code is typically simply inaccessible. 

In this chapter we review a number of techniques that can be applied to gener- 
ally constrained continuous optimization problems for which the cost function and 
constraint computation can be considered as a black box system. We wish to clearly 
distinguish between methods that aim at providing just a solution (local optimiza- 
tion; see Section l4~3l l. and approaches that try to avoid being trapped in local optima 
(global optimization; see Section l4~4l l. Local optimization is much easier to handle 
than global optimization, since, in general, there is no algorithmically suitable char- 
acterization of global optima. As a consequence, there are more theoretical results of 
practical relevance for local than for global optimizers (e.g., convergence conditions 
and rate). For more details on theoretical aspects of derivative-free optimization we 
strongly recommend both the review [2] and the book [3]. The techniques are de- 
scribed for continuous variables, but it is possible to apply, with care, extensions of 
many of them to mixed-integer scenarios. However, since mixed-integer nonlinear 
programming is still an emergent area (especially in simulated-based optimization), 
we prefer not to include recommendations in this case. 

In some situations, numerical derivatives can be computed fairly efficiently (e.g., 
via a computer cluster), and still yield results that can be acceptable in practice. 
However, if the function/constraint evaluations are even moderately noisy, numer- 
ical derivatives are usually not useful. Though methods that rely on approximate 
derivatives are not derivative-free techniques per se, for example, in the absence of 
noise, they can address optimization in a black box approach. We should note that 
in addition to their inherent additional computational costs, numerical derivatives 
very often imply the tuning of the derivative approximation together with the sim- 
ulation tolerances, and this is not always easy to do. Implicit filtering [4, 5] may 
somehow alleviate some of these issues. This approach is essentially a gradient- 
based procedure where the derivative approximation is improved as the optimization 
progresses. Implicit filtering has been recommended for problems with multiple lo- 
cal optima (e.g., noisy cost functions). For more details on gradient-based method- 
ologies the reader is encouraged to regard nonlinear optimization references (for 
example, 0,0]). 
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Many derivative-free methods are easy to implement, and this feature makes them 
attractive when approximate solutions are required in a short time frame. An obvi- 
ous statement that is often neglected is that the computational cost of an iteration 
of an algorithm is not always a good estimate of the time needed within a project 
(measured from its inception) to obtain results that are satisfactory. However, one 
important drawback of derivative-free techniques (when compared, for example, 
with adjoint-based approaches) is the limitation on the number of optimization vari- 
ables that can be handled. For example, in [ 3] and [2] the limit given is a few hundred 
variables. However, this limit in the problem size can be overcome, at least to some 
extent, if one is not restricted to a single sequential environment. For some of the 
algorithms though, adequately exploiting parallelism may be difficult or even im- 
possible. When distributed computing resources are scarce or not available, and for 
simulation-based designs with significantly more than a hundred optimization vari- 
ables, some form of parameter reduction is mandatory. In these cases, surrogates 
or reduced order models [ 8] for the cost function and constraints are desirable ap- 
proaches. Fortunately, suitable parameter and model order reduction techniques can 
often be found in many engineering applications, although they may give rise to in- 
accurate models. We should add that even in theory, as long as a problem with nons- 
mooth/noisy cost functions/constraints can be reasonably approximated by a smooth 
function (see J21, Section 10.6), some derivative-free optimization algorithms per- 
form well with nonsmooth/noisy cost functions, as has been observed in practice 

iH. 

In the last decade, there has been a renaissance of gradient-free optimization 
methodologies, and they have been successfully applied in a number of areas. Exam- 
ples of this are ubiquitous; to name a few, derivative-free techniques have been used 
within molecular geometry IIOn . aircraft desig n 111 1,1 1211 . hydrodynamics |13ill4ll . 

These references include 



medicine 1151 [l6J] and earth sciences [171 LUa ll 



generally constrained cases with derivative-free objective functions and constraints, 
continuous and integer optimization variables, and local and global approaches. In 
spite of all this apparent abundance of results, we should not disregard the general 
recommendation (see [3, 2]) of strongly preferring gradient-based methods if accu- 
rate derivative information can be computed reasonably efficiently and globally. 

This chapter is structured as follows. In Section 14.21 we introduce the gen- 
eral problem formulation and notation. A number of derivative-free methodologies 
for unconstrained continuous optimization are presented in the next two sections. 
Section 14.31 refers to local optimization, and Section 14.41 is devoted to global op- 
timization. Guidelines for extending all these algorithms to generally constrained 
optimization are given in Section 14.51 We bring the chapter to an end with some 
conclusions and recommendations. 

4.2 Derivative-Free Optimization 

A general single-objective optimization problem can be formally stated as: 

min f(x) subiectto e(x)<0, (4.1) 
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where / (x) is the objective function, x G R" is the vector of control variables, and 
g : R" — > W" represents the nonlinear constraints in the problem. Bound and linear 
constraints are included in the set Q cR". For many approaches it is natural to treat 
any constraints for which derivatives are available separately. In particular, bounds 
and linear constraints, and any other structure than can be exploited, should be. So 
for example, nonlinear least-squares problems should exploit that inherent structure 
whenever possible (see e.g. 12 111 ). We are interested in applications for which the 
objective function and constraint variables are computed using the output from a 
simulator, rendering function evaluations expensive and derivatives unavailable. 

We will begin by discussing some general issues with respect to optimization 
with derivatives since they have important relevancy to the derivative-free case. 
Essentially all approaches to the former are somewhere between steepest descent 
and Newton's method, or equivalently use something that is between a linear and 
a quadratic model. This is reinforced by the realization that almost all practical 
computation is linear at its core, and (unconstrained) minima are characterized by 
the gradient being zero, and quadratic models give rise to linear gradients. In fact, 
theoretically at least, steepest descent is robust but slow (and in fact sometimes so 
slow that in practice it is not robust) whereas Newton's method is fast but may have 
a very small radius of convergence. That is, one needs to start close to the solu- 
tion. It is also computationally more demanding. Thus in a sense, most practical 
unconstrained algorithms are intelligent compromises between these two extremes. 
Although, somewhat oversimplified, one can say that the constrained case is dealt 
with by being feasible, determining which constraints are tight, linearizing these 
constraints and then solving the reduced problem determined by these linearizations. 
Therefore, some reliable first-order model is essential, and for faster convergence, 
something more like a second-order model is desirable. In the unconstrained case 
with derivatives these are typically provided by a truncated Taylor series model (in 
the first-order case) and some approximation to a truncated second-order Taylor se- 
ries model. A critical property of such models is that as the step sizes become small 
the models become more accurate. In the case where derivatives, or good approx- 
imations to them, are not available, clearly, one cannot use truncated Taylor series 
models. It thus transpires that, if for example, one uses interpolation or regression 
models, that depend only on function values, one can no longer guarantee that as the 
step sizes become small the models become more accurate. Thus one has to have 
some explicit way to make this guarantee, at least approximately. It turns out that 
this is usually done by considering the geometry of the points at which the func- 
tion is evaluated, at least, before attempting to decrease the effective maximum step 
size. In pattern search methods, this is done by explicitly using a pattern with good 
geometry, for example, a regular mesh that one only scales while maintaining the a 
priori good geometry. 

In the derivative case the usual stopping criteria relates to the first-order optimal- 
ity conditions. In the derivative-free case, one does not explicitly have these, since 
they require (approximations to) the derivatives. At this stage we just remark that 
any criteria used should relate to the derivative case conditions, so, for example one 
needs something like a reasonable first-order model, at least asymptotically. 



4 Derivative-Free Optimization 65 

4.3 Local Optimization 

The kernel of many optimizers are local methods. This is not surprising, since, as 
we already mentioned, there is no suitable algorithmic characterization of global 
optima unless one considers special situations such as where all local optima are 
global, as for example in convex minimization problems. In this section we con- 
centrate on local search methods based on pattern search and also on interpolation 
and approximation models. Some constraint handling procedures are described in 
Section 1431 

4.3.1 Pattern Search Methods 

Pattern search methods are optimization procedures that evaluate the cost function 
in a stencil-based fashion determined by a set of directions with intrinsic prop- 
erties meant to be desirable from a geometric/algebraic point of view. This sten- 
cil is sequentially modified as iterations proceed. The recent popularity of these 
schemes is due in part to the development of a mathematically sound convergence 
theory [2, 3]. Moreover, they are attractive because they can relatively easily lever- 
age the widespread availability of parallel computing resources. However, most 
published computational results are not parallel exploiting. 

4.3.1.1 Generalized Pattern Search 



Generalized pattern search (GPS; [22j,|23|]) refers to a whole family of optimiza 



tion methods. GPS relies on polling (local exploration of the cost function on the 
pattern) but may be enhanced by additional searches, see 1230 . At any particular it- 
eration a stencil (pattern) is centered at the current solution. The stencil comprises 
a set of directions such that at least one direction is a descent direction. This is also 
called a generating set (see e.g. [2]). If any of the points in the stencil represent an 
improvement in the cost function, the stencil is moved to one of them. Otherwise, 
the stencil size is decreased. The optimization progresses until some stopping crite- 
rion is satisfied (typically, a minimum stencil size). Generalized pattern search can 
be further generalized by polling in an asymptotically dense set of directions (this 
set varies with the iterations). The resulting algorithm is the mesh adaptive direct 
search (MADS; [24]). In particular, some generalization of a simple fixed pattern is 
essential for constrained problems. The GPS method parallelizes naturally since, at a 
particular iteration, the objective function evaluations at the polling points can be ac- 
complished in a distributed fashion. The method typically requires on the order of n 
function evaluations per iteration (where n is the number of optimization variables). 

4.3.1.2 Hooke- Jeeves Direct Search 

The Hooke- Jeeves direct search (HJDS; 12510 is another pattern search method and 
was the first to use the term 'direct search' method and take advantage of the idea 
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— ►• Exploratory move 

►■ Pattern move 

# Improvement 

O No Improvement 

Fig. 4.1 Illustration of exploratory and pattern moves in Hooke-Jeeves direct search (modi- 
fied from 1 19]). The star represents the optimum. 



of a pattern. HJDS is based on two types of moves: exploratory and pattern. These 
moves are illustrated in Figure [4J] for some optimization iterations in R 2 . 

The iteration starts with a base point xo and a given step size. During the ex- 
ploratory move, the objective function is evaluated at successive changes of the 
base point in the search (for example coordinate) directions. All the directions are 
polled sequentially and in an opportunistic way. This means that if di 6 R" is the 
first search direction, the first function evaluation is at Xo + di. If this represents 
an improvement in the cost function, the next point polled will be, assuming n > 1, 
xo + di + d2, where d2 is the second search direction. Otherwise the point xo — di 
is polled. Upon success at this last point, the search proceeds with xo — di + d2, and 
alternatively with xo + d2. The exploration continues until all search directions have 
been considered. If after the exploratory step no improvement in the cost function is 
found, the step size is reduced. Otherwise, a new point xi is obtained, but instead of 
centering another exploratory move at Xi, the algorithm performs the pattern move, 
which is a more aggressive step that moves further in the underlying successful di- 
rection. After the pattern move, the next polling center X2 is set at xo + 2(xi — xo). 
If the exploratory move at X2 fails to improve upon xi, a new polling is performed 
around Xj. If this again yields no cost function decrease, the step size is reduced, 
keeping the polling center at xi . 

Notice the clear serial nature of the algorithm. This makes HJDS a reason- 
able pattern search option when distributed computing resources are not available. 
Because of the pattern move, HJDS may also be beneficial in situations where an op- 
timum is far from the initial guess. One could argue that initially pattern search tech- 
niques should use a relatively large stencil size on the hope that this feature enables 
them to avoid some local minima and, perhaps, some robustness against noisy cost 
functions. 
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4.3.2 Derivative-Free Optimization with Interpolation and 
Approximation Models 

The other major approach to derivative-free optimization is based on building mod- 
els that are meant to approximate the functions and then make use of derivative 
methods on the models. The advantage is that one is trying to take account of the 
shape of the function rather than naively just using the function evaluations alone. 
As our introductory remarks in Section l4~2l suggest we can expect our models to be 
at least first-order models or better still, second-order. 

A major drawback of this approach is that, since the models are not based upon 
an a priori pattern, as with just polling, the geometry of the sample points used re- 
quires special attention. Additionally, one pays for the extra sophistication of these 
methods in that they are not obviously parallelizable. Some of the better known al- 
gorithms in this class include DFO 13], NEWUOA 10] and BOOSTERS [27]. The 
basic ideas will be given here but it is recommended that the diligent reader consult 
Chapters 3-6 of H. 

First of all, what does good geometry mean? Essentially, for example, if one 
wants to consider interpolation by a polynomial of degree d, where d = 1 , that is 
linear interpolation, one needs n + 1 points and good geometry means they do not 
lie on or close to a linear surface. Similarly, if one wants to consider interpolation 
by a polynomial of degree d, where d = 2, that is quadratic interpolation, one needs 
(n + 1)(« + 2)/2 points and good geometry means they do not lie on or close to a 
quadratic or linear surface. The extension to higher degree is clear. One can also 
see why the problem goes away if one works with a suitable pattern, as in a pattern 
search method. 

Now, all three methods mentioned above are trust-region based. For an in- 
troduction to trust-region techniques the readers are referred to [7[], or |9[] for a 
monographic volume. In the case with derivatives the essential ingredients are the 
following. Starting at a given point xo one has a region about that point, coined 
the trust region and denoted by Aq. The trust region is typically a sphere in the 
Euclidean or in the infinity norm. One then requires a model m (x) for the true ob- 
jective function that is relatively easy to minimize within the trust region (e.g., a 
truncated first-order Taylor series or an approximation to a truncated second-order 
Taylor series, about the current point). A search direction from the current point is 
determined based upon the model and one (approximately) minimizes the model 
within the trust region. 

The trust region can be updated in the following manner. Suppose yi is the ap- 
proximate minimizer of the model within the trust region Aq. We then compare the 
predicted reduction to truth in the sense that we consider 

/(xo)-/(yi) 
m(xo)-m(yi)' 
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Then typically one assigns some updating strategy to the trust-region radius Aq 
like 

(2-Ao, ifp>0.9, 

A\ = I A Q , if 0.1 <p <0.9, 

[o.5-4 if p < 0.1, 

where A\ denotes the updated radius. In the first two cases xi = yi and in the third 
case Xi = X(j. 

Thus, although oversimplified, if we are using Taylor series approximations for 
our models, within the trust management scheme one can ensure convergence to a 
solution satisfying first-order optimality conditions [2Q. Perhaps the most important 
difference once derivatives are not available is that we cannot take Taylor series 
models and so, in general, optimality can no longer be guaranteed. In fact, we have 
to be sure that when we reduce the trust-region radius it is because of the problem 
and not just a consequence of having a bad model as a result of poor geometry of 
the sampling points. So it is here that one has to consider the geometry. Fortunately, 
it can be shown that one can constructively ensure good geometry, and with that, 
support the whole derivative-free approach with convergence to solutions that satisfy 
first-order optimality conditions. For details see [3], Chapter 6. 



4.4 Global Optimization 

In the previous section we have concentrated on local search methods. Unfortu- 
nately, most real-world problems are multimodal, and global optima are generally 
extremely difficult to obtain. Local search methods find local optima that are not 
guaranteed to be global. Here we will give a short survey of global optimization 
methods. However, the reader should take note of the following. In practice, often 
good local optima suffice. If one is considering even a modest number of variables, 
say fifty, it is generally very difficult, if not impossible, to ensure convergence to 
a provable global solution, in a reasonable length of time, even if derivatives are 
available, not to mention in the derivative-free case. Almost all algorithms designed 
to determine local optima are significantly more efficient than global methods. 

Many successful methods in global optimization are based on stochastic compo- 
nents, as they allow to escape from local optima and overcome premature stagnation. 
Famous classes of families of stochastic global optimization methods are evolution- 
ary algorithms, estimation of distribution algorithms, particle swarm optimization, 
and differential evolution. Further heuristics known in literature are simulated an- 
nealing [28, 29], tabu search [30, 31], ant colony optimization [32, 33], and artificial 



immune systems B341 13511 . In this section, we concentrate on the first four classes of 



methods that have been successful in a number of practical applications. 

4.4.1 Evolutionary Algorithms 

A history of more than forty years of active research on evolutionary compu- 
tation indicates that stochastic optimization algorithms are an important class of 
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1 Start 

2 Initialize solutions x, of population &* 

3 Evaluate objective function for the solutions x,- in & 

4 Repeat 

5 For i = To X 

6 Select p parents from 8? 

7 Create new x,- by recombination 

8 Mutate X; 

9 Evaluate objective function for x,- 

10 Add x ; to &>' 

1 1 Next 

12 Select /x parents from .9" and form new &* 

1 3 Until termination condition 

14 End 



Fig. 4.2 Pseudocode of a generic evolutionary algorithm. 



derivative-free search methodologies. The separate development of evolutionary al- 
gorithms (EAs) in the United States and Europe led to different kinds of algorithmic 
variants. Genetic algorithms were developed by John Holland in the United States 
at the beginning of the seventies. Holland's intention was to exploit adaptive behav- 
ior. In his book Adaptation in Natural and Artificial Systems 13611 he describes the 
development of genetic algorithms (GAs). His original algorithm is today known 
as simple GA. Evolutionary programming by Fogel, Owens and Walsh [37] was 
originally designed for optimization of the evolvement of deterministic finite au- 
tomata, but has today been extended to numerical optimization. Evolution strate- 
gies (ES) were developed by Rechenberg and Schwefel in the middle of the sixties 



tveiopec 
MM 



in Germany [38, 391 14011 . In the following, we introduce the idea of evolutionary 



optimization, that is closely related to evolution strategies. 

4.4.1.1 Algorithmic Framework 

The basis of evolutionary search is a population f? := {xi, . . . ,x^} of candidate 
solutions, also called individuals. Figure 14.21 shows the pseudocode of a general 
evolutionary algorithm. The optimization process takes three steps. In the first step 
the recombination operator (see Section 14.4.1.21 ) selects p parents and combines 
them to obtain new solutions. In the second step the mutation operator (see Sec- 
tion l4.4.L3l adds random noise to the preliminary candidate solution. The objective 
function / (x) is interpreted in terms of the quality of the individuals, and in EA 
lexicon is called fitness. The fitness of the new offspring solution is evaluated. All 
individuals of a generation form the new population &", In the third step, when 
A solutions have been produced, jj, individuals, with \x < X, are selected (see Sec- 
tion l4.4.ll4l . and form the new parental population of the following generation. The 
process starts again until a termination condition is reached. Typical termination 
conditions are the accomplishment of a certain solution quality, or an upper bound 
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on the number of generations. We now concentrate on the stochastic operators that 
are often used in evolutionary computation. 

4.4.1.2 Recombination 

In biological systems recombination, also known as crossover, mixes the genetic 
material of two parents. Most EAs also make use of a recombination operator and 
combine the information of two or more individuals into a new offspring solution. 
Hence, the offspring carries parts of the genetic material of its parents. The use of 
recombination is discussed controversially within the building block hypothesis by 
Goldberg 114 ll 14211 . and the genetic repair effect by Beyer 14311 . 



Typical recombination operators for continuous representations are dominant 
and intermediary recombination. Dominant recombination randomly combines the 
genes of all parents. If we consider parents of the formx = (x\,...x n ), dominant re- 
combination with p parents x 1 , . . . , x p creates the offspring vector x' = (jtj , . . . ,xj,) 
by random choice of the ;-th component x' i : 



x':=xl 



k€ random {!,..., p}. (4.2) 



Intermediate recombination is appropriate for integer and real-valued solution 
spaces. Given p parents x',...,x p each component of the offspring vector x' is 
the arithmetic mean of the components of all p parents. Thus, the characteristics of 
descendant solutions lie between their parents: 

*;:=!£*?. (4.3) 

P k=\ 

Integer representations may require rounding procedures to produce intermediate 
integer solutions. 

4.4.1.3 Mutation 

Mutation is the second main source for evolutionary changes. According to Beyer 



and Schwefel I38I1 . a mutation operator is supposed to fulfill three conditions. First, 
from each point in the solution space each other point must be reachable. Second, 
in unconstrained solution spaces a bias is disadvantageous, because the direction to 
the optimum is not known. And third, the mutation strength should be adjustable 
in order to adapt to solution space conditions. In the following, we concentrate on 
the well-known Gaussian mutation operator. We assume that solutions are vectors 
of real values. Random numbers based on the Gaussian distribution =/K(0, 1) satisfy 
these conditions in continuous domains. The Gaussian distribution can be used to 
describe many natural and artificial processes. By isotropic Gaussian mutation each 
component of x is perturbed independently with a random number from a Gaussian 
distribution with zero mean and standard deviation a. 
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Fig. 4.3 Gaussian mutation: isotropic Gaussian mutation (left) uses one step size a for each 
dimension, multivariate Gaussian mutation (middle) allows independent step sizes for each 
dimension, and correlated mutation (right) introduces an additional rotation of the coordinate 
system 



The standard deviation a plays the role of the mutation strength, and is also 
known as step size. The step size a can be kept constant, but convergence can be 
improved by adapting a according to the local solution space characteristics. In 
case of high success rates, i.e., a high number of offspring solutions being better 
than their parents, large step sizes are advantageous in order to promote the explo- 
ration of the search space. This is often the case at the beginning of the search. 
Small step sizes are appropriate for low success rates. This is frequently adequate 
in later phases of the search, when the optimization history can be exploited while 
the optimum is approximated. An example for an adaptive control of step sizes is 
the 1/5-th success rule by Rechenberg 1391 that increases the step size if the success 
rate is over 1/5-th, and decreases it, if the success rate is lower. 

The isotropic Gaussian mutation can be extended to the multivariate Gaussian 
mutation by introducing a step size vector a with independent step sizes a,. Fig- 
ure 14.31 illustrates the differences between isotropic Gaussian mutation (left) and 
the multivariate Gaussian mutation (middle). The multivariate variant considers a 
mutation ellipsoid that adapts flexibly to local solution space characteristics. 

Even more flexibility can be obtained through the correlated mutation proposed 
by Schwefel [44] that aligns the coordinate system to the solution space charac- 
teristics. The mutation ellipsoid is rotated by means of an orthogonal matrix, and 
this rotation can be modified along iterations. The rotated ellipsoid is also shown in 
Figure l4~3l (right). The covariance matrix adaptation evolution strategies (CMA-ES) 
and derivates J45L |46y are self-adapting control strategies based on an automatic 
alignment of the coordinate system. 



4.4.1.4 Selection 



The counterpart of the variation operators mutation and recombination is selection. 
Selection gives the evolutionary search a direction. Based on the fitness, a subset of 
the population is selected, while the rest is rejected. In EAs the selection operator 
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can be utilized at two points. Mating selection selects individuals for recombina- 
tion. Another popular selection operator is survivor selection, corresponding to the 
Darwinian principle of survival of the fittest. Only the individuals selected by sur- 
vivor selection are allowed to confer genetic material to the following generation. 
The elitist strategies plus and comma selection choose the ji best solutions and are 
usually applied for survivor selection. Plus selection selects the jjl best solutions 
from the union £P U S 1 ' of the last parental population & and the current offspring 
population &*' , and is denoted by (]U + A )-EA. In contrast to plus selection, comma 
selection, which is denoted by (]U,A)-EA, selects exclusively from the offspring 
population, neglecting the parental population — even if individuals have superior 
fitness. Though disregarding these apparently promising solutions may seem to be 
disadvantageous, this strategy that prefers the new population to the old population 
can be useful to avoid being trapped in unfavorable local optima. 

The deterministic selection scheme described in the previous paragraph is a char- 
acteristic feature of ES. Most evolutionary algorithms use selection schemes con- 
taining random components. An example is fitness proportionate selection (also 
called roulette-wheel selection) popular in the early days of genetic algorithms B41I1 . 
Another example is tournament selection, a widely used selection scheme for EAs. 
Here, the candidate with the highest fitness out of a randomly chosen subset of the 
population is selected to the new population. The stochastic-based selection schemes 
permit survival of not-so-fit individuals and thus helps with preventing premature 
convergence and preserving the genetic material that may come in handy at later 
stages of the optimization process. 

4.4.2 Estimation of Distribution Algorithms 

Related to evolutionary algorithms are estimation of distribution algorithms (EDAs). 
They also operate with a set of candidate solutions. Similar to ES, a random set of 
points is initially generated, and the objective function is computed for all these 
points. The core of EDAs are successive steps where distributions of the best solu- 
tions within a population are estimated, and a new population is sampled according 
to the previous distribution estimation. 

The principle has been extended in a number of different manners. Most EDAs 
make use of parametric distributions, i.e., the parameters of distribution functions 
are determined in the estimation step. The assumption of a Gaussian distribution 
is frequent in EDAs. EDAs may suffer from premature convergence. The weighted 



variance estimator introduced in B47H has been observed to alleviate that conver- 
gence issue. Adaptive variance scaling [48], i.e., the variance can be increased if 
good solutions are found, otherwise it is decreased, has also been suggested to avoid 
early stagnation. The sampling process can be enhanced by anticipated mean shift 
(AMS; 14911 '). In this approach, about two thirds of the population are sampled regu- 
larly, and the rest is shifted in the direction of a previously estimated gradient. If this 
estimate is accurate, all the shifted individuals, together with part of the non-shifted 
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individuals, may survive, and the variance estimate in the direction of the gradient 
could be larger than without AMS. 

4.4.3 Particle Swarm Optimization 

Similar to evolutionary algorithms, particle swarm optimization (PSO) is a pop- 
ulation approach with stochastic components. Introduced by Kennedy and Eber- 
hart 1 50] , it is inspired by the movement of natural swarms and flocks. The algorithm 
utilizes particles with a position x that corresponds to the optimization variables, and 
a speed v which is similar to the mutation strength in evolutionary computation. The 
principle of particle swarm optimization is based on the idea that the particles move 
in the solution space, influencing each other with stochastic changes, while previous 
successful solutions act as attractors. 

In each iteration the position of particle x is updated by adding the current 
velocity v 

x':=x + v. (4.4) 

The velocity is updated as follows 

v' :=v + c l n{x* p -x)+c 2 r 2 (x* s -x), (4.5) 

where x* and x* denote the best previous positions of the particle and of the swarm, 
respectively. The weights c\ and c 2 are acceleration coefficients that determine the 
bias of the particle towards its own or the swarm history. The recommendation given 
by Kennedy and Eberhart is to set both parameters to one. The stochastic compo- 
nents r\ and r 2 are uniformly drawn from the interval [0,1], and can be used to 
promote the global exploration of the search space. 

4. 4. 4 Differential Evolution 

Another population-based optimization approach is differential evolution (DE), 
originally introduced by Storn and Price [51]. As the algorithms in the previous 
three subsections, DE exploits a set of candidate solutions (agents in DE lexicon). 
New agents are allocated in the search space by combining the positions of other ex- 
isting agents. More specifically, an intermediate agent is generated from two agents 
randomly chosen from the current population. This temporary agent is then mixed 
with a predetermined target agent. The new agent is accepted for the next generation 
if and only if it yields reduction in objective function. 

The basic DE algorithm uses a random initialization. A new agent y = \y\ , . . . ,y n ] 
is created from the existing one x = [jci, . . . ,x n ] as indicated below. 

1 . Three agents a = [a\ , . . . , a„] , b = \b\ , . . . , b n ] and c = [c\ , . . . , c„] are randomly 
extracted from the population (all distinct from each other and from x). 

2. A position index p <E {!,... ,N} is determined randomly. 
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3. The position of the new agent y is computed by means of the following iteration 
over; G {l,...,n}: 

i) select a random number r, G (0, 1) with uniform probability distribution; 
ii) if i = p or r; <CR\etyi =aj + F(bj — Cj), otherwise let yi =Xi\ here,F G [0,2] 

is the differential weight and CR G [0, 1] is the crossover probability, both 

defined by the user; 
iii) if /(y) < /(x) then replace x by y; otherwise reject y and keep x. 

Although DE resembles some other stochastic optimization techniques, unlike tra- 
ditional EAs, DE perturbs the solutions in the current generation vectors with scaled 
differences of two randomly selected agents. As a consequence, no separate prob- 
ability distribution has to be used, and thus the scheme presents some degree of 
self-organization. Additionally, DE is simple to implement, uses very few con- 
trol parameters, and has been observed to perform satisfactorily in a number of 
multi-modal optimization problems 15211 . 

4.5 Guidelines for Generally Constrained Optimization 

We now describe nonlinear constraint handling techniques that can be combined 
with the optimization methods presented in Sections [4.3l and l4~4l 

4.5.1 Penalty Functions 

The penalty function method (cf. [7]) for general optimization constraints involves 
modifying the objective function with a penalty term that depends on the constraint 
violation h : W — > K. The original optimization problem in ( 14. II ) is thus modified as 
follows: 

min /(x)+p/z(x), (4.6) 

XGX2CK" J ^ ' ^ ^ ' 

where p > is a penalty parameter. The modified optimization problem may still 
have constraints that are straightforward to handle. 

If the penalty parameter is iteratively increased (tending to infinity), the solution 
of ( 14.61 1 converges to that of the original problem in ( 14.11 1. However, in certain cases, 
a finite (and fixed) value of the penalty parameter p also yields the correct solution 
(this is the so-called exact penalty; see fl7[]). For exact penalties, the modified cost 
function is not smooth around the solution [7], and thus the corresponding optimiza- 
tion problem can be significantly more involved than that in l !4.6b . However, one can 
argue that in the derivative-free case exact penalty functions may in some cases be 
attractive. Common definitions of h(x), where / and J denote the indices that refer 
to inequality and equality constraints, respectively, are 

/*(x) = l(£max(0,g,(x)) 2 + E^(x)) 
\iei jeJ j 
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the quadratic penalty and 

*(*) = Xmax(0,&(x)) + £ l«WI 
ie/ jeJ 

an exact penalty. It should be noticed that by these penalties, the search considers 
both feasible and infeasible points. Those optimization methodologies where the 
optimum can be approached from outside the feasible region are known as exterior 
methods. 

The log-barrier penalty (for inequality constraints) 

ft(x) = -5>g(-«to) 

iei 

has to be used with a decreasing penalty parameter (tending to zero). This type of 
penalty methods (also known as barrier methods) confines the optimization to the 
feasible region of the search space. Interior methods aim at reaching the optimum 
from inside the feasible region. 

In M53I1 . non-quadratic penalties have been suggested for pattern search tech- 
niques. However, the optimizations presented in that work are somewhat simpler 
than those found in many practical situations, so the recommendations given might 
not be generally applicable. In future research, it will be useful to explore further 
the performance of different penalty functions in the context of simulation-based 
optimization. 



4.5.2 Augmented Lagrangian Method 

As mentioned above, in exterior penalty function methods, as p — > °° the local mini- 
mum is approached from outside the feasible region. Not surprisingly, there is a way 
to shift the feasible region so one is able to determine the local solution for a finite 
penalty parameter. See, for example, 154115511 for original references, and also |7|], 
Chapter 17. 

Augmented Lagrangian methods J56l 15711 aim at minimizing, in the equality 



constraint case, the following extended cost function 

s ™., /W + 3P llgW ||| + A r g(x) , (4.7) 

where p > is a penalty parameter, and A G R m are Lagrange multipliers. This cost 
function can indeed be interpreted as a quadratic penalty with the constraints shifted 
by some constant term I56I1 . As in penalty methods, the penalty parameter and the 
Lagrange multipliers are iteratively updated. It turns out that if one is sufficiently 
stationary for Equation ( 14.7b . which is exactly when we have good approximations 
for the Lagrange multipliers, then A can be updated via 

A + =A+pg(x), (4.8) 
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Fig. 4.4 An idealized (pattern search) filter at iteration k (modified from 11911 ') 



where A denotes the updated Lagrange multipliers. Otherwise one should increase 
the penalty parameter p (say by multiplying it by 10). The Lagrange multipliers are 
typically initialized to zero. What is significant is that one can prove (see e.g. 15(31 ") 
that after a finite number of iterations the penalty parameter is never updated, and 
that the whole scheme eventually converges to a solution of the original optimization 
problem in ( 14.11 1. Inequality constraints can also be incorporated in the augmented 
Lagrangian framework by introducing slack variables and simple bounds [56]. The 
augmented Lagrangian approach can be combined with most optimization algo- 
rithms. For example, refer to [58] for a nonlinear programming methodology based 
on generalized pattern search. 



4.5.3 Filter Method 

A relatively recent approach that avoids using a penalty parameter and has been 
rather successful is the class of so-called filter methods I59LI711. Using filters, the 
original problem (14.11 1 is typically viewed as a bi-objective optimization problem. 
Besides minimizing the cost function /(x), one also seeks to reduce the constraint 
violation h (x). The concept of dominance, crucial in multi-objective optimization, 
is defined as follows: the point xi <E K" dominates X2 £ K" if and only if either 
/(xi) < /(X2) and h(x\) < h(x 2 ), or/(xi) < /(x 2 ) and h(xi) < h(x 2 ). A filter 
is a set of pairs (h (x) , / (x) ) , such that no pair dominates another pair. In practice, 
a maximum allowable constraint violation /i max is specified. This is accomplished 
by introducing the pair (h max , —°°) in the filter. An idealized filter (at iteration k) is 
shown in Figure l4~4l 

A filter can be understood as essentially an add-on for an optimization proce- 
dure. The intermediate solutions proposed by the optimization algorithm at a given 
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iteration are accepted if they are not dominated by any point in the filter. The filter 
is updated at each iteration based on all the points evaluated by the optimizer. We 
reiterate that, as for exterior methods, the optimization search is enriched by con- 
sidering infeasible points, although the ultimate solution is intended to be feasible 
(or very nearly so). Filters are often observed to lead to faster convergence than 
methods that rely only on feasible iterates. 

Pattern search optimization techniques have been previously combined with fil- 
ters [60]. In Hooke- Jeeves direct search, the filter establishes the acceptance crite- 
rion for each (unique) new solution. For schemes where, in each iteration, multiple 
solutions can be accepted by the filter (such as in GPS), the new polling center must 
be selected from the set of validated points. When the filter is not updated in a par- 
ticular iteration (and thus the best feasible point is not improved), the pattern size is 
decreased. As in [60], when we combine GPS with a filter, the polling center at a 
given iteration will be the feasible point with lowest cost function or, if no feasible 
points remain, it will be the infeasible point with lowest constraint violation. These 
two points, (0,/jM and (h\,fj.), respectively, are shown in Figure l4~4l (it is assumed 
that both points have just been accepted by the filter, and thus it makes sense to use 
one of them as the new polling center). Refer to [60] and [61] for more details on 
pattern search filter methods. 

4.5.4 Other Approaches 

We will now briefly overview a number of constraint handling methodologies that 
have been proposed for evolutionary algorithms. Repair algorithms E62l 1630 project 
infeasible solutions back to the feasible space. This projection is in most cases ac- 
complished in an approximate manner, and can be as complex as solving the op- 
timization problem itself. Repair algorithms can be seen as local procedures that 
aim at reducing constraint violation. In the so-called Baldwinian case, the fitness of 
the repaired solution replaces the fitness of the original (infeasible) solution. In the 
Lamarckian case, feasible solutions prevail over infeasible solutions. 

Constraint-handling techniques borrowed from multi-objective optimization are 



based on the idea of dealing with each constraint as an additional objective B64LI65L 
66l l67l |68[ 169^ Under this assumption, multi-objective optimization methods such 
as NSGA-II 17011 or SPEA 117 111 can be applied. The output of a multi-objective ap- 
proach for constrained optimization is an approximation of a Pareto set that involves 
the objective function and the constraints. The user may then select one or more so- 
lutions from the Pareto set. A simpler but related and computationally less expensive 
procedure is the behavioral memory method presented in [72]. This evolutionary 
method concentrates on minimizing the constraint violation of each constraint se- 
quentially, and the objective function is addressed separately afterwards. However, 
treating objective function and constraints independently may yield in many cases 
infeasible solutions. 
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Further constraint handling methods have been proposed in EA literature that 
do not rely either on repair algorithms or multi-objective approaches. In 17311 a 
technique based on a multi-membered evolution strategy with a feasibility compari- 
son mechanism is introduced. The dynamic multi-swarm particle optimizer studied 



in 17411 makes use of a set of sub-swarms that focus on different constraints, and is 



coupled with a local search algorithm (sequential quadratic programming). 



4.6 Concluding Remarks 

In this chapter, we have concentrated on methods for solving optimization problems 
without derivates. The existence of local optima makes a hard optimization problem 
even harder. Many methods have been proposed to solve non-convex optimization 
problems. The approaches range from pattern search for local optimization prob- 
lems to stochastic bio-inspired search heuristics for multi-modal problems. Deter- 
ministic local methods are guaranteed to find local optima, and restart variants can 
be applied to avoid unsatisfactory solutions. Stochastic methods are not guaranteed 
to find the global optimum, but in some practical cases they can be beneficial. 
The hybridization between local and global optimizers has led to a paradigm 



sometimes called memetic algorithms or hybrid metaheuristics I75LI76H . A number 
of hybridizations have been proposed, but they are often tailored to specific prob- 
lem types and search domains due to their specific operators and methods. In the 
memetic method introduced in [77] for continuous search spaces, a gradient-based 
scheme is combined with a deterministic perturbation component. The local opti- 
mization procedure for real-valued variables described in [78] is based on variable 
neighborhood search. It would be very useful if in future research some effort is 
dedicated to better understand from a theoretical point of view the hybridization of 
local and global optimization algorithms. 

Most problems that can be found in practice present constraints. We have outlined 
a number of constraint handling techniques that can be incorporated in a derivative- 
free optimization framework. Though penalty functions are appealing due to their 
simplicity, some of the other approaches mentioned here may be more efficient and 
still of a relatively easy implementation. 

Multi-objective optimization is an important challenge for derivative-free method- 
ologies. Some of the evolutionary techniques mentioned above have performed suc- 
cessfully in some not especially involved multi-objective test cases. Other areas 
where derivative-free optimization could potentially be very helpful include dy- 
namic optimization, mixed-integer nonlinear programming, and optimization under 
uncertainty (stochastic programming). 

Acknowledgements. We are grateful to the industry sponsors of the Stanford Smart Fields 
Consortium for partial funding of this work, and also to J. Smith for his valuable suggestions. 



4 Derivative-Free Optimization 79 

References 

[i 

[2 



[3 
[4 
[5 
[6 

[7 
[8 

[9; 

[10 
[11 

[12 

[13 

[14 

[15 
[16 
[17 



Pironneau, O.: On optimum design in fluid mechanics. Journal of Fluid Mechanics 64, 
97-110(1974) 

Kolda, T.G., Lewis, R.M., Torczon, V.: Optimization by direct search: new perspec- 
tives on some classical and modern methods. SIAM Review 45(3), 385^182 (2003) 
Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimiza- 
tion. MPS-SIAM Series on Optimization. MPS-SIAM (2009) 

Gilmore, P., Kelley, C.T.: An implicit filtering algorithm for optimization of functions 
with many local minima. SIAM Journal on Optimization 5, 269-285 (1995) 
Kelley, C.T.: Iterative Methods for Optimization. In: Frontiers in Applied Mathemat- 
ics, SIAM, Philadelphia (1999) 

Dennis Jr., J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization 
and Nonlinear Equations. SIAM's Classics in Applied Mathematics Series. SIAM, 
Philadelphia (1996) 

Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Heidelberg 
(2006) 

Schilders, W.H.A., van der Vorst, H.A., Rommes, J.: Model Order Reduction: The- 
ory, Research Aspects and Applications. Mathematics in Industry Series. Springer, 
Heidelberg (2008) 

Conn, A.R., Gould, N.I.M.: Toint, Ph.L.: Trust-Region Methods. MPS-SIAM Series 
on Optimization. MPS-SIAM (2000) 

Meza, J.C., Martinez, M.L.: On the use of direct search methods for the molecular 
conformation problem. Journal of Computational Chemistry 15, 627-632 (1994) 
Booker, A.J., Dennis Jr., J.E., Frank, P.D., Moore, D.W., Serafini, D.B.: Optimization 
using surrogate objectives on a helicopter test example. In: Borggaard, J.T., Burns, J., 
Cliff, E., Schreck, S. (eds.) Computational Methods for Optimal Design and Control, 
pp. 49-58. Birkhauser, Basel (1998) 

Marsden, A.L., Wang, M., Dennis Jr., J.E., Moin, P.: Trailing-edge noise reduction 
using derivative-free optimization and large-eddy simulation. Journal of Fluid Me- 
chanics 572, 13-36 (2003) 

Duvigneau, R., Visonneau, M.: Hydrodynamic design using a derivative-free method. 
Structural and Multidisciplinary Optimization 28, 195-205 (2004) 
Fowler, K.R., Reese, J.P, Kees, C.E., Dennis Jr., J.E., Kelley, C.T., Miller, C.T., Audet, 
C, Booker, A.J., Couture, G., Darwin, R.W., Farthing, M.W., Finkel, D.E., Gablonsky, 
J.M., Gray, G, Kolda, T.G.: Comparison of derivative-free optimization methods for 
groundwater supply and hydraulic capture community problems. Advances in Water 
Resources 31(5), 743-757 (2008) 

Oeuvray, R., Bierlaire, M.: A new derivative-free algorithm for the medical image 
registration problem. International Journal of Modelling and Simulation 27, 115-124 
(2007) 

Marsden, A.L., Feinstein, J.A., Taylor, C.A.: A computational framework for 
derivative-free optimization of cardiovascular geometries. Computational Methods in 
Applied Mechanics and Engineering 197, 1890-1905 (2008) 

Artus, V., Durlofsky, L.J., Onwunalu, J.E., Aziz, K: Optimization of nonconven- 
tional wells under uncertainty using statistical proxies. Computational Geosciences 10, 
389^104 (2006) 



80 O. Kramer, D. Echeverria Ciaurri, and S. Koziel 

[18] Dadashpour, M., Echeverria Ciaurri, D., Mukerji, T., Kleppe, J., Landr0, M.: A 
derivative-free approach for the estimation of porosity and permeability using time- 
lapse seismic and production data. Journal of Geophysics and Engineering 7, 35 1-368 
(2010) 

[19] Echeverria Ciaurri, D., Isebor, O.J., Durlofsky, L.J.: Application of derivativefree 
methodologies for generally constrained oil production optimization problems. In- 
ternational Journal of Mathematical Modelling and Numerical Optimisation 2(2), 
134-161 (2011) 

[20] Onwunalu, J.E., Durlofsky, L.J.: Application of a particle swarm optimization algo- 
rithm for determining optimum well location and type. Computational Geosciences 14, 
183-198 (2010) 

[21] Zhang, H., Conn, A.R., Scheinberg, K.: A derivative-free algorithm for leastsquares 
minimization. SIAM Journal on Optimization 20(6), 3555-3576 (2010) 

[22] Torczon, V.: On the convergence of pattern search algorithms. SIAM Journal on Opti- 
mization 7(1), 1-25 (1997) 

[23] Audet, C, Dennis Jr., J.E.: Analysis of generalized pattern searches. SIAM Journal on 
Optimization 13(3), 889-903 (2002) 

[24] Audet, C, Dennis Jr., J.E.: Mesh adaptive direct search algorithms for constrained 
optimization. SIAM Journal on Optimization 17(1), 188-217 (2006) 

[25] Hooke, R., Jeeves, T.A.: Direct search solution of numerical and statistical problems. 
Journal of the ACM 8(2), 212-229 (1961) 

[26] Powell, M.J.D.: The NEWUOA software for unconstrained optimization without 
derivatives. Technical report DAMTP 2004/NA5, Dept. of Applied Mathematics and 
Theoretical Physics, University of Cambridge (2004) 

[27] Oeuvray, R., Bierlaire, M.: BOOSTERS: a derivative-free algorithm based on radial 
basis functions. International Journal of Modelling and Simulatio 29(1), 26-36 (2009) 

[28] Metropolis, N, Rosenbluth, A., Teller, A., Teller, E.: Equation of state calculations by 
fast computing machines. Chemical Physics 21(6), 1087-1092 (1953) 

[29] Kirkpatrick, S., Gelatt Jr., CD., Vecchi, M.: Optimization by simulated annealing. 
Science 220(4498), 671-680 (1983) 

[30] Glover, E: Tabu search - part I. ORSA Journal on Computing 1(3), 190-206 (1990) 

[31] Glover, E: Tabu search - part II. ORSA Journal on Computing 2(1), 4-32 (1990) 

[32] Dorigo, M.: Optimization, Learning and Natural Algorithms. PhD thesis, Dept. of 
Electronics, Politecnico di Milano (1992) 

[33] Dorigo, M., Stiitzle, T: Ant Colony Optimization. Prentice-Hall, Englewood Cliffs 
(2004) 

[34] Farmer, J., Packard, N., Perelson, A.: The immune system, adaptation and machine 
learning. Physica2, 187-204 (1986) 

[35] Castro, L.N.D., Timmis, J.: Artificial Immune Systems: A New Computational Intel- 
ligence. Springer, Heidelberg (2002) 

[36] Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan 
Press (1975) 

[37] Fogel, D.B.: Artificial Intelligence through Simulated Evolution. Wiley, Chichester 
(1966) 

[38] Beyer, H.-G., Schwefel, H.-P: Evolution strategies - a comprehensive introduction. 
Natural Computing 1, 3-52 (2002) 

[39] Rechenberg, I.: Evolutions strategie: Optimierung Technischer Systeme nach Prinzip- 
ien der Biologischen Evolution. Frommann-Holzboog (1973) 



4 Derivative-Free Optimization 8 1 



[40 
[41 
[42 
[43 
[44 

[45 

[46 
[47 

[48 

[49 

[50 
[51 



[52 
[53 



[54 
[55 
[56 

[57 

[58 



Schwefel, H.-R: Numerische Optimierung von Computer-Modellen mittel der Evolu- 
tionsstrategie. Birkhauser, Basel (1977) 

Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. 
Addison-Wesley, Reading (1989) 

Holland, J.H.: Hidden Order: How Adaptation Builds Complexity. Addison- Wesley, 
London (1995) 

Beyer, H.-G.: An alternative explanation for the manner in which genetic algorithms 
operate. BioSystems41(l), 1-15 (1997) 

Schwefel, H.-R: Adaptive mechanismen in der biologischen evolution und ihr einfluss 
auf die evolutionsgeschwindigkeit. In: Interner Bericht der Arbeitsgruppe Bionik und 
Evolutionstechnik am Institut fur Mess- und Regelungstechnik, TU Berlin (1974) 
Beyer, H.-G., Sendhoff, B.: Covariance matrix adaptation revisited - the CMSA evo- 
lution strategy -. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C, Beume, N. (eds.) 
PPSN 2008. LNCS, vol. 5199, pp. 123-132. Springer, Heidelberg (2008) 
Ostermeier, A., Gawelczyk, A., Hansen, N.: A derandomized approach to selfadapta- 
tion of evolution strategies. Evolutionary Computation 2(4), 369-380 (1994) 
Teytaud, F., Teytaud, O. : Why one must use reweighting in estimation of distributional- 
gorithms. In: Proceedings of the 1 1th Annual conference on Genetic and Evolutionary 
Computation (GECCO 2009), pp. 453-460 (2009) 

Grahl, J., Bosman, P.A.N. , Rothlauf, R: The correlation-triggered adaptive variance 
scaling idea. In: Proceedings of the 8th Annual conference on Genetic and Evolution- 
ary Computation (GECCO 2006), pp. 397-404 (2006) 

Bosman, P.A.N., Grahl, J., Thierens, D.: Enhancing the performance of maximum- 
likelihood gaussian eDAs using anticipated mean shift. In: Rudolph, G, Jansen, T., 
Lucas, S., Poloni, C, Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 133-143. 
Springer, Heidelberg (2008) 

Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of the IEEE 
International Conference on Neural Networks, pp. 1942-1948 (1995) 
Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global 
optimization over continuous spaces. Journal of Global Optimization 11, 341-359 
(1997) 

Chakraborty, U.: Advances in Differential Evolution. SCI. Springer, Heidelberg (2008) 
Griffin, J.D., Kolda, T.G: Nonlinearly-constrained optimization using asynchronous 
parallel generating set search. Technical report SAND2007-3257, Sandia National 
Laboratories (2007) 

Hestenes, M.R.: Multiplier and gradients methods. Journal of Optimization Theory 
and Applications 4(5), 303-320 (1969) 

Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: 
Fletcher, R. (ed.) Optimization, pp. 283-298. Academic Press, London (1969) 
Conn, A.R., Gould, N.I.M., Toint, PL.: A globally convergent augmented Lagrangian 
algorithm for optimization with general constraints and simple bounds. SIAM Journal 
on Numerical Analysis 28(2), 545-572 (1991) 

Conn, A.R., Gould, N.I.M., Toint, P.L.: LANCELOT: A Fortran Package for Large- 
Scale Nonlinear Optimization (Release A). Computational Mathematics. Springer, 
Heidelberg (1992) 

Lewis, R.M., Torczon, V.: A direct search approach to nonlinear programming prob- 
lems using an augmented Lagrangian method with explicit treatment of the linear con- 
straints. Technical report WM-CS-2010-01, Dept. of Computer Science, College of 
William & Mary (2010) 



82 O. Kramer, D. Echeverria Ciaurri, and S. Koziel 

[59] Fletcher, R., Leyffer, S., Toint, P.L.: A brief history of filter methods. Technical report 
ANL/MCS/JA-58300, Argonne National Laboratory (2006) 

[60] Audet, C, Dennis Jr., J.E.: A pattern search filter method for nonlinear programming 
without derivatives. SIAM Journal on Optimization 14(4), 980-1010 (2004) 

[61] Abramson, M.A.: NOMADm version 4.6 User's Guide. Dept. of Mathematics and 
Statistics, Air Force Institute of Technology (2007) 

[62] Belur, S.V.: CORE: constrained optimization by random evolution. In: Koza, J.R. (ed.) 
Late Breaking Papers at the Genetic Programming 1997 Conference, pp. 280-286 
(1997) 

[63] Coello Coello, C.A.: Theoretical and numerical constraint handling techniques used 
with evolutionary algorithms: a survey of the state of the art. Computer Methods in 
Applied Mechanics and Engineering 191(11-12), 1245-1287 (2002) 

[64] Parmee, I.C., Purchase, G.: The development of a directed genetic search technique 
for heavily constrained design spaces. In: Parmee, I.C. (ed.) Proceedings of the Con- 
ference on Adaptive Computing in Engineering Design and Control, pp. 97-102. 
University of Plymouth (1994) 

[65] Surry, P.D., Radcliffe, N.J., Boyd, I.D.: A multi-objective approach to constrained 
optimisation of gas supply networks: the COMOGA method. In: Fogarty, T.C. (ed.) 
AISB-WS 1995. LNCS, vol. 993, pp. 166-180. Springer, Heidelberg (1995) 

[66] Coello Coello, C.A.: Treating constraints as objectives for single-objective evolution- 
ary optimization. Engineering Optimization 32(3), 275-308 (2000) 

[67] Coello Coello, C.A.: Constraint handling through a multiobjective optimization tech- 
nique. In: Proceedings of the 8th Annual conference on Genetic and Evolutionary 
Computation (GECCO 1999), pp. 117-118 (1999) 

[68] Jimenez, F, Verdegay, J.L.: Evolutionary techniques for constrained optimization 
problems. In: Zimmermann, H.J. (ed.) 7th European Congress on Intelligent Tech- 
niques and Soft Computing (EUFIT 1999). Springer, Heidelberg (1999) 

[69] Mezura-Montes, E., Coello Coello, C.A.: Constrained optimization via multiobjective 
evolutionary algorithms. In: Knowles, J., Corne, D., Deb, K., Deva, R. (eds.) Multiob- 
jective Problem Solving from Nature. Natural Computing Series, pp. 53-75. Springer, 
Heidelberg (2008) 

[70] Deb, K., Agrawal, S., Pratap, A., Meyarivan, T: A fast and elitist multiobjective 
genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 
182-197 (2002) 

[71] Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength Pareto evolution- 
ary algorithm for multiobjective optimization. In: Evolutionary Methods for Design, 
Optimisation and Control with Application to Industrial Problems (EUROGEN 2001), 
pp. 95-100 (2002) 

[72] Schoenauer, M., Xanthakis, S.: Constrained GA optimization. In: Forrest, S. (ed.) 
Proceedings of the 5th International Conference on Genetic Algorithms (ICGA 1993), 
pp. 573-580. Morgan Kaufmann, San Francisco (1993) 

[73] Montes, E.M., Coello Coello, C.A.: A simple multi-membered evolution strategy to 
solve constrained optimization problems. IEEE Transactions on Evolutionary Compu- 
tation 9(1), 1-17(2005) 

[74] Liang, J., Suganthan, P.: Dynamic multi-swarm particle swarm optimizer with a novel 
constraint-handling mechanism. In: Yen, G.G., Lucas, S.M., Fogel, G., Kendall, G., 
Salomon, R., Zhang, B.-T, Coello Coello, C.A., Runarsson, T.P. (eds.) Proceedings of 
the 2006 IEEE Congress on Evolutionary Computation (CEC 2006), pp. 9-16. IEEE 
Press, Los Alamitos (2006) 



4 Derivative-Free Optimization 83 

[75] Raidl, G.R.: A unified view on hybrid metaheuristics. In: Almeida, R, Blesa Aguilera, 
M.J., Blum, C, Moreno Vega, J.M., Perez Perez, M., Roli, A., Sampels, M. (eds.) HM 
2006. LNCS, vol. 4030, pp. 1-12. Springer, Heidelberg (2006) 

[76] Talbi, E.G.: A taxonomy of hybrid metaheuristics. Journal of Heuristics 8(5), 541-564 
(2002) 

[77] Griewank, A.: Generalized descent for global optimization. Journal of Optimization 
Theory and Applications 34, 11-39 (1981) 

[78] Duran Toksari, M., Giiner, E.: Solving the unconstrained optimization problem by 
a variable neighborhood search. Journal of Mathematical Analysis and Applica- 
tions 328(2), 1178-1187 (2007) 



Chapter 5 

Maximum Simulated Likelihood Estimation: 

Techniques and Applications in Economics 



Ivan Jeliazkov and Alicia Lloro 



Abstract. This chapter discusses maximum simulated likelihood estimation when 
construction of the likelihood function is carried out by recently proposed Markov 
chain Monte Carlo (MCMC) methods. The techniques are applicable to parameter 
estimation and Bayesian and frequentist model choice in a large class of multivariate 
econometric models for binary, ordinal, count, and censored data. We implement the 
methodology in a study of the joint behavior of four categories of U.S. technology 
patents using a copula model for multivariate count data. The results reveal inter- 
esting complementarities among several patent categories and support the case for 
joint modeling and estimation. Additionally, we find that the simulated likelihood 
algorithm performs well. Even with few MCMC draws, the precision of the likeli- 
hood estimate is sufficient for producing reliable parameter estimates and carrying 
out hypothesis tests. 

5.1 Introduction 

The econometric analysis of models for multivariate discrete data is often compli- 
cated by intractability of the likelihood function, which can rarely be evaluated di- 
rectly and typically has to be estimated by simulation. In such settings, the efficiency 
of likelihood estimation plays a key role in determining the theoretical properties and 
practical appeal of standard optimization algorithms that rely on those estimates. For 
this reason, the development of fast and statistically efficient techniques for estimat- 
ing the value of the likelihood function has been at the forefront of much of the 
research on maximum simulated likelihood estimation in econometrics. 

In this paper we examine the performance of a method for estimating the ordi- 
nate of the likelihood function which was recently proposed in [8|. The method is 
rooted in Markov chain Monte Carlo (MCMC) theory and simulation Bll4l [T5lfT8ll . 
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and its ingredients have played a central role in Bayesian inference in econometrics 
and statistics. The current implementation of those methods, however, is intended to 
examine their applicability to purely frequentist problems such as maximum likeli- 
hood estimation and hypothesis testing. 

We implement the methodology to study firm-level patent registration data in 
four patent categories in the "computers & instruments" industry during the 1980s. 
One goal of this application is to examine how patent counts in each category are 
affected by firm characteristics such as sales, workforce size, and research & devel- 
opment (R&D) capital. A second goal is to study the degree of complementarity or 
substitutability that emerges among patent categories due to a variety of unobserved 
factors, such as firms' internal R&D decisions, resource concentration, managerial 
dynamics, technological spillovers, and the relevance of innovations across category 
boundaries. These factors can affect multiple patent categories simultaneously and 
necessitate the specification of a joint empirical structure that can flexibly capture 
interdependence patterns. 

We approach these tasks by considering a copula model for multivariate count 
data which enables us to pursue joint modeling and estimation. Because the out- 
come probabilities in the copula model are difficult to evaluate, we rely on MCMC 
simulation to evaluate the likelihood function. Moreover, to improve the perfor- 
mance of the optimization algorithm, we implement a quasi-Newton optimization 
method due to [1| that exploits a fundamental statistical relation to avoid direct 
computation of the Hessian matrix of the log-likelihood function. The application 
demonstrates that the simulated likelihood algorithm performs very well - even with 
few MCMC draws, the precision of the likelihood estimate is sufficient for produc- 
ing reliable parameter estimates and hypothesis tests. The results support the case 
for joint modeling and estimation in our application and reveal interesting comple- 
mentarities among several patent categories. 

The remainder of this chapter is organized as follows. In Section l5T2l we present 
the copula model that we use in our application and the likelihood function that we 
use in estimation. The likelihood function is difficult to evaluate because it is given 
by a set of integrals with no closed-form solution. For this reason, in Section 15.31 
we present the MCMC-based simulation algorithm for evaluating this function and 
discuss how it can be embedded in a standard optimization algorithm to maxi- 
mize the log-likelihood function and yield parameter estimates and standard errors. 
Section 15.41 presents the results from our patent application and demonstrates the 
performance of the estimation algorithm. Section lBTBl offers concluding remarks. 



5.2 Copula Model 

In analyzing multiple data series, it is typically desirable to pursue joint modeling 
and estimation. Doing so allows researchers to investigate dependence structures 
among the individual variables of interest, leads to gains in estimation efficiency, 
and is also important for mitigating misspecification problems in nonlinear models. 
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In many applications, however, a suitable joint distribution may be unavailable or 
difficult to specify. This problem is particularly prevalent in multivariate discrete 
data settings, and in cases where the variables are of different types (e.g. some con- 
tinuous, some discrete or censored). One area of research where incorporating a flex- 
ible and interpretable correlation structure has been difficult is the empirical analysis 
of multivariate count data |2| |2T1 . As a consequence, models for multivariate counts 
have typically sacrificed generality for the sake of retaining computational tractabil- 
ity. To deal with the aforementioned difficulties, we resort to a copula modeling 
approach whose origins can be traced back to ifTTll . 

Formally, a copula maps the unit hypercube [0, \] q to the unit interval [0, 1] and 
satisfies the following conditions: 

1 . C(l, . . . , l,a p , 1, . . . , 1) = a p for every p £ {l,...,q} and all a p G [0,1]; 

2. C(a\,...,a q ) = if a p —0 for any p <E {l,...,q}; 

3. C is ^-increasing, i.e. any hyperrectangle in [0, l] 9 has non-negative C- volume. 

The generality of the approach rests on the recognition that a copula can be viewed 
as a ^-dimensional distribution function with uniform marginals, each of which 
can be related to an arbitrary known cumulative distribution function (cdf) Fj(-), 
j = l,...,q. For example, if a random variable Uj is uniform Uj ~ U(0,l), and 
yj = Fj(tij), then it is easy to show that yj ~ Fj(-). As a consequence, if the vari- 
ables y\,... ,y q have corresponding univariate cdfs F\(yi),... ,F q (y q ) taking values 
in [0, 1], a copula is a function that can be used to link or "couple" those univariate 
marginal distributions to produce the joint distribution function F(yi, . . . ,y q ): 

F(y h ...,y q )=C(F l (y l ),...,F q (y q )). (5.1) 

A detailed overview of copulas is provided in |9), lfT3l . and ll20l . The key feature 
that will be of interest here is that they provide a way to model dependence among 
multiple random variables when their joint distribution is not easy to specify, in- 
cluding cases where the marginal distributions {/•)(•)} belong to entirely different 
parametric classes. 

There are several families of copulas, but the Gaussian copula is a natural mod- 
eling choice when one is interested in extensions beyond the bivariate case. The 
Gaussian copula is given by 

C(u\Q) = q (0- l ( Ul ),...,0- 1 (u cl )\Q), (5.2) 

where u = {u\ , . . . , u q )', <Z> represents the standard normal cdf, and & q is the cdf for 
a multivariate normal vector z = (zi,- • • >Zq)', z~N(0,Q), where Q is in correlation 
form with ones on the main diagonal. The data generating process implied by the 
Gaussian copula specification is given by 

»/ = VW^)}' Zt~N(0,Q), i=l,...,n, j =l,...,q, (5.3) 
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where Fn is a cdf specified in terms of a vector of parameters 9j and covariates xy , q 
is the dimension of each vector v,- = ( yn , . . . , yiq)' , and n is the sample size. Note that 
the correlation matrix Q for the latent z, induces dependence among the elements 
of yt and that the copula density will typically be analytically intractable. 

The structures in 15.11 15.21 and 15.31 are quite general and apply to both discrete 
and continuous outcomes. However, it is important to recognize that the inverse cdf 
mapping FH (•) in !5.3l is one-to-one when yjj is continuous and many-to-one when 
yij is discrete. Therefore, in the latter case it is necessary to integrate over the values 
of zi that lead to the observed y,- in order to obtain their joint distribution. In our 
implementation, this integration is performed by MCMC methods. 

In this chapter, we use the Gaussian copula framework to specify a joint model 
for multivariate count data, where each count variable yy G {0, 1,2, . . .} follows a 
variable- specific negative binomial distribution 

yi j^NB{Xij,aj), (5.4) 

with probability mass function (pmf) given by 

r{aj+yij)r° J (l~r u yv 
Pr(^l%a/)= r(l +»>(«,) ' *' > °' "'' > °' ^ 

where ry = aCj/(ttj + Ay), and dependence on the covariates is modeled through 
Xij = exp(xf { jPj), Here, and in the remainder of this chapter, all vectors will be 
taken to be column vectors. The distribution in 15.51 has mean Ay and variance 
Ay(l +Xij/(Xj), so that, depending on a,-, it allows for varying degrees of over- 
dispersion. The variance can be much larger than the mean for small values of a,, 
but in the limit (as a ; — * °°) the two are equal, as in the Poisson model where the 
conditional variance equals the conditional mean. Negative binomial models are 
carefully reviewed in 0, (6|, and ll2~fl . 

The cdf for the negative binomial distribution is obtained by summing the pmf 
in [33] for values less than or equal to yy: 

y» 

Fj(y u \X ih aj) = X PrW% «/')• (5-6) 

k=0 

To relate the negative binomial distribution to the Gaussian copula, the pmf and 
cdf computed in 15.51 and 15.61 respectively, can be used to find unique, recursively 
determined cutpoints 



nj l u=®- 1 (Fj(y i j\Pj,a J )) 

7ij.L =$ \F j {y ij \pj,aj)-?Y{y i j\X i j,a j )) 



-Uv.r- .-.in. ^ o./„ ji. „^ (5-7) 



that partition the standard normal distribution so that for zu ~ N(0, 1), we have 
Pr(zy < Yij.u) = Fj(yij\kj,aj) and Pr(^ ;X < Zij < Jij.u) = Pr(yy|A, 7 ,a ; ). Hence, 
the cutpoints in l5.7l provide the range By = ("Jy,/., Yij,u] of Zy that is consistent with 
each observed outcome yy in !5.3l In turn, because Zi = {zn > • • • ,Zi q )' ~ N(0,Q), the 
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Gaussian copula representation implies that the joint probability of observing the 
vector yi = (y n ,... ,y iq )' is given by 



PrCy/|0,fl) = 



B a 



f N {zi\0,Q)dzi, 



(5.8) 



in which fy(-) denotes the normal density and, for notational convenience, we let 
6 = (0[, . . . , 6')', where dj — {fiLaj) 1 represents the parameters of the /'th marginal 
model, which determine the regions of integration B (/ = (yj.L,Yij,u], j = 1>- ••,<?• 
Figure IBTTl offers an example of how the region of integration is constructed in the 
simple bivariate case. Because of the dependence introduced by the correlation ma- 
trix Q, the probabilities in 15. 8 1 have no closed-form solution and will be estimated 
by MCMC simulation methods in this chapter. Once computed, the probabilities 
in 15. 81 also called likelihood contributions, can be used to construct the likelihood 
function 



/(y|0,fl)=ni>r(y,|0,fl) 



(5.9) 



i=i 



The likelihood function is then used in obtaining maximum likelihood estimates 
and Q, standard errors, and in performing model comparisons and hypothesis tests. 
Because the likelihood contributions are obtained by simulation, f(y\6,Q) in 15.91 
is referred to as the simulated likelihood function, and the estimates 6 and Q are 
called maximum simulated likelihood estimates. We next discuss the simulation and 
optimization techniques that are used to obtain those estimates. 




Fig. 5.1 An example of the region of integration implied by a bivariate Gaussian copula 
model 
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5.3 Estimation Methodology 

Estimation by maximum simulated likelihood requires evaluation of the outcome 
probabilities for each observation vector y,. Because each outcome probability in !5.8l 
is defined by an analytically intractable multidimensional integral, in Section [5.3.1l 
we describe a method for evaluating the outcome probabilities which was introduced 
in |8|. The method, called the Chib-Ritter- Tanner (CRT) method, stems from de- 
velopments in MCMC simulation and Bayesian model choice and is well suited for 
evaluating outcome probabilities that comprise the likelihood of a variety of discrete 
data models. Because the CRT estimator produces continuous and differentiable es- 
timates of 15. 91 in Section [5.3.2l we describe how it can be applied in standard quasi- 
Newton gradient-based optimization using the Berndt-Hall-Hall-Hausman (BHHH) 
approach proposed in [ 1 1. The BHHH approach exploits a fundamental statistical re- 
lation to avoid direct computation of the Hessian matrix of the log-likelihood func- 
tion in the optimization algorithm. 

5.3.1 The CRT Method 

The CRT method, proposed in [8|, is derived from theory and techniques in MCMC 
simulation and Bayesian model selection (see |3), 031 ). where evaluation of mul- 
tidimensional integrals with no analytical solution is routinely required. To under- 
stand the approach, note that the outcome probability in !5.8l can be rewritten as 

Pr(y,\e,Q)= /lfagM^fa|0,fl)rfa= 1{Z ^ Bi} /^ Zi j?: Q \ (5.10) 

J JTN B .(,Zi\0,U) 

where 5,- = Bj\ x Bq x • • • x Bj q and /tn b . (•) represents the truncated normal den- 
sity that accounts for the truncation constraints reflected in Bj. This representation 
follows by Bayes formula because Pr(y, 6 , Q ) is the normalizing constant of a trun- 
cated normal distribution, and its representation in terms of the quantities in l5.10l is 
useful for developing an estimation strategy that is simple and efficient. As discussed 
in [3 1, this identity is particularly useful because it holds for any value of zi G Bj and 
therefore, given that the numerator quantities 1 {z* £ Bj} and fy (z* 0, Q) in l5.10l are 
directly available, the calculation is reduced to finding an estimate of the ordinate 
/tn b . (z* 1 0, X2) at a single point z* 6 Bi, typically taken to be the sample mean of the 
draws Zi ~ 77^(0,12) that will be simulated in the estimation procedure (details 
will be presented shortly). The log-probability is subsequently obtained as 

lnPr(y / |0,X2)=ln/ w (zf|O,r2)-ln/^.^|O,X2), (5.11) 

To estimate fjN B . (z,*|0,£2) in 15. Ill the CRT method relies on random draws z, ~ 
TNbj(0,O) which are produced by the Gibbs sampling algorithms of [5| or [16|, 
where a new value for zt is generated by iteratively simulating each element Zij 
from its full-conditional density zij ~ f(zij\yij,{ztk}tyji&) = TN B u (lJ-ij,Oij) for 
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j = 1 , . . . , q. In the preceding, \l\j and of: are the conditional mean and variance of 
Zij given {zikjfc/tj an d are obtained by the usual formulas for a conditional Gaussian 
density. MCMC simulation of zi ~ TNg i (0, £2) is an important tool for drawing from 
this density, which is non-standard due to the multiple constraints defining the set 
Bj and the correlations in Q . 

The Gibbs transition kernel for moving from a point zi to z* is given by the 
product of univariate truncated normal full-conditional densities 

K( ZhZ *\y h e,Q) = Ylf(z*j\yi,{z* k }k<j,Uik}k>j,e,Q). (5.12) 

Because the full-conditional densities represent the fundamental building blocks of 
the Gibbs sampler, the additional coding involved in evaluating 15. 121 is minimized. 
By virtue of the fact that the Gibbs sampler satisfies Markov chain invariance (see 
I1T81 I41). in our context we have that 

f T N Bi (z*\0,Q) =Jk{ Zu z* \yi,e,Q)f mB .(zi\0,Q)dzi, (5.13) 

a more general version of which was exploited for density estimation in ITT31 . There- 
fore, an estimate of fjN B . (z*|0, i2) for use in l5.10l or l5.11l can be obtained by invok- 
ing [5TT3] and averaging the transition kernel K(zi,z*\yi, 0,Q) with respect to draws 
from the truncated normal distribution zf ~ TNs t (0, Q ) , g = 1 , . . . , G, i.e. 

Wz*|0,r2) = ^K^^^O^Q). (5.14) 

When repeated evaluation of 15.131 is required, e.g. in evaluating derivatives of 
/( y 1 6 , Q ) , one should remember to reset the random number generation seed used in 
the simulations. The CRT method produces continuous and differentiable estimates 
of Pr(y,|0,f2) and can thus be applied directly in derivative-based optimization as 
discussed next. 



5.3.2 Optimization Technique 

Let i// represent the vector of parameters that enter the log-likelihood function 
In f(y\\f/). For the copula model that we considered in Section 1331 I// consists of 
the elements of 9 and the unique off-diagonal entries of Q (recall that Q is sym- 
metric positive definite matrix with ones on the main diagonal) and the likelihood 
function f(y \ 1//) is given in |5.9l Standard Newton-Raphson maximization of the log- 
likelihood function In f(y\\j/) proceeds by updating the value of the parameter vector 
in iteration t , i// r , to a new value, i//, + i , using the formula 

\j/ t+ i =\i/,-XH- l g t , (5.15) 
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where g, — dlnf(y\\jf t )/dy t and H, — <5 2 In /(jy | y/ r ) /<5 y/ r <5 1/// are the gradient vec- 
tor and Hessian matrix, respectively, of the log-likelihood function at y/, and A is a 
step size. Gradient-based methods are widely used in log-likelihood optimization 
because many statistical models have well-behaved log-likelihood functions and 
gradients and Hessian matrices are often required for statistical inference, e.g. in ob- 
taining standard errors or Lagrange multiplier test statistics. The standard Newton- 
Raphson method, however, has well-known drawbacks. One is that computation 
of the Hessian matrix can be quite computationally intensive. For a k dimensional 
parameter vector y/, computing the Hessian requires 0(k 2 ) evaluations of the log- 
likelihood function. In the context of simulated likelihood estimation, where k can 
be very large and each likelihood evaluation can be very costly, evaluation of the 
Hessian presents a significant burden that adversely affects the computational effi- 
ciency of Newton-Raphson. Another problem is that (— H) may fail to be positive 
definite. This may be due to purely numerical issues (e.g. the computed Hessian may 
be a poor approximation to the analytical one) or it may be caused by non-concavity 
of the log-likelihood function. In those instances, the Newton-Raphson iterations 
will fail to converge to a local maximum. 

To deal with these difficulties, [ 1 ] noted that an application of a fundamental sta- 
tistical relationship, known as the information identity, obviates the need for direct 
computation of the Hessian. Because we are interested in maximizing a statistical 
function given by the sum of the log-likelihood contributions over a sample of obser- 
vations, it is possible to use statistical theory to speed up the iterations. In particular, 
by definition we have 

r f(y\y)dy=h (5.16) 



where it is assumed that if there are any limits of integration, they do not depend on 
the parameters y/. With this assumption, an application of Leibniz's theorem implies 
that d{J f(y\y/)dy}/dy/ — J df(y\y/)/dy/dy. Moreover, because df(y\y/)/dyr = 
{<3ln/(y|i//)/c>i//'}/(v|i//), upon differentiation of both sides of !5.16l with appropri- 
ate substitutions, we obtain 

WWffyWO. (5.17) 



dy/ 

Differentiating I5TT71 with respect to y/ once again (recalling that under our assump- 
tions we can interchange integration and differentiation), we get 

— =; — ^r - ; — /(wH =; =T3 — (dy = 0, (5.18) 

dy/dyr' Jy - ,r ' dyr dyr' J 

where, taking advantage of the equality df(y\y/)/dyf — {dlnf(y\y/)/dy/}f(y\y/) 
once again, we obtain the primary theoretical result underlying the BHHH approach 
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The left side of equation 15 . 1 91 gives E(—H), whereas on the right side we have 
E(gg') which also happens to be Var(g) because from l5.17l we know that E(g) = 0. 
Now, because the log-likelihood is the sum of independent log-likelihood contribu- 
tions, i.e. ln/(y| \jf) — X"=i h/Cft'lv)' ll follows that 

Var(g) = f j Var(g i )^j^g i g' i , 

in which g, = dln/(v,|i//)/c>i//. Therefore, the BHHH algorithm for maximizing the 
log-likelihood function relies on the recursions 

y, +l = y t +XB; l g t , (5.20) 

where B t = £? =1 P'"^ 1 ^ ] P'"^ 1 ^ ] ' is used in lieu of -H t in l5T5l 

Working with the outer product of gradients matrix, B r , has several important ad- 
vantages. First, computation of the gradients requires O(k) likelihood evaluations 
and hence yields significant computational benefits relative to direct evaluation of 
H, which requires 0(k 2 ) such evaluations. Note that {d ln/(y,|i// f )/d\j/ t } are calcu- 
lated anyway in computing g t and that obtaining B, only involves taking their outer 
product but requires no further evaluations of ln/(y|i//). Second, B t is necessarily 
positive definite, as long as the parameters are identified, even in regions where 
the log-likelihood is convex. Hence, the BHHH algorithm guarantees an increase 
in \nf(y\\f/) for a small enough step size A. Finally, B t is typically more computa- 
tionally stable than H, , thereby reducing numerical difficulties in practice (e.g. with 
inversion, matrix decomposition, etc.). 

We make an important final remark about the interplay between simulation and 
optimization in maximum simulated likelihood estimation: precise estimation of the 
log-likelihood is essential for correct statistical inference. Specifically, it is crucial 
for computing likelihood ratio statistics, information criteria, marginal likelihoods 
and Bayes factors, and is also key to mitigating simulation biases in the maximum 
simulated likelihood estimation of parameters, standard errors, and confidence inter- 
vals (see [12 1, [ 19]). For instance, if the probabilities that enter f(y\\j/) are estimated 
imprecisely, the maximum likelihood estimate will be biased (by Jensen's inequal- 
ity) and the estimates of {d ln/(y,- 1 \f/ t ) jd \j/ t } will be dominated by simulation noise. 
This adversely affects the estimated standard errors because B t is inflated by simu- 
lation noise rather than capturing genuine log-likelihood curvature. Hence, relying 
on the modal value of B^ 1 as an estimate of the covariance matrix of xjf will produce 
standard errors and confidence bands that are too optimistic (too small). In extreme 
cases, parameters that are weakly identified may appear to be estimated well, due 
entirely to the simulation noise. Such problems can be recognized by examining the 
behavior of the estimated standard errors (square root of the diagonal of the modal 
value of By 1 ) for different values of the simulation size G in 15.141 to determine 
whether they are stable or tend to decrease as G is increased. 
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Table 5.1 Descriptive statistics for the explanatory variables in the patent count application 

Variable Description Mean SD 

In(SALES) Log of real sales (millions) 6.830 1.703 

ln(WF) Log of number of company employees 2.398 1.717 
In(RDC) Log of real R&D capital (millions) 5.593 1.815 



5.4 Application 

In this section, we implement the methodology developed earlier to study the joint 
behavior of firm- level patent registrations in four technology categories in the "com- 
puters & instruments" industry during the 1980s. We use the data sample of ifTOl . 
which consists of n = 498 observations on 254 manufacturing firms from the U.S. 
Patent & Trademark Office data set discussed in Q and ITU . The response variable 
is a 4 x 1 vector v,- (;' = 1 , . . . , 498) containing firm-level counts of registered patents 
in communications (COM), computer hardware & software (CHS), computer pe- 
ripherals (CP), and information storage (IS). The explanatory variables reflect the 
characteristics of individual firms and, in addition to a category specific intercept, 
include sales (SALES), workforce size (WF), and R&D capital (RDC). Sales are 
measured by the annual sales revenue of each firm, while the size of the workforce 
is given by the number of employees that the firm reports to stockholders. R&D 
capital is a variable constructed from the history of R&D investment using inven- 
tory and depreciation rate accounting standards discussed in [7]. All explanatory 
variables, except the intercept, are measured on the logarithmic scale. Table 15.11 
contains variable explanations along with descriptive statistics. 

To analyze these multivariate counts, we use a Gaussian copula model with nega- 
tive binomial marginals which was presented in Section l5T2l The negative binomial 
specification is suitable for this application because patent counts exhibit a heavy 
right tail, and hence it is useful to specify a model that can account for the possible 
presence and extent of over-dispersion. In addition to examining how patents in each 
category are affected by firm characteristics, joint modeling allows us to study the 
interdependence of patent counts that emerges due to technological spillovers, man- 
agerial incentives, and internal R&D decisions. For instance, technological break- 
throughs and know-how in one area may produce positive externalities and spill 
over to other areas. Moreover, significant discoveries may produce patents in multi- 
ple categories, resulting in positive correlation among patent counts. Alternatively, 
the advancement of a particular technology may cause a firm to re-focus and con- 
centrate its resources to that area at the expense of others, thereby producing nega- 
tive correlations. The dependence structure embodied in the correlation matrix Q of 
the Gaussian copula model that we consider is intended to capture these and other 
factors that can affect multiple patent categories simultaneously. 

We estimate the copula model by first estimating the parameters of each negative 
binomial model separately by maximum likelihood and then using those estimates 
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as a starting point for maximizing the copula log-likelihood. The individual nega- 
tive binomial models have well-behaved log-likelihood functions and are relatively 
fast and straightforward to estimate by standard optimization techniques such as 
those presented in Section l5.3.2l Parameter estimates for the independent negative 
binomial models and the joint Gaussian copula model are presented in Table 15.21 



Table 5.2 Maximum simulated likelihood estimates of independent negative binomial (NB) 
models and joint Gaussian copula model with standard errors in parentheses 





Independent NB Models Gaussian Copula Model 




COM CHS CP IS COM CHS CP IS 


Intercept 


0.968 -1.712 -5.834 -2.105 0.917 -1.471 -6.099 -2.033 




(0.993) (0.939) (1.682) (0.568) (1.040) (0.986) (1.645) (0.628) 


ln{SALES) 


-0.297 -0.202 0.084 -0.190 -0.285 -0.194 0.242 -0.181 




(0.254) (0.233) (0.423) (0.122) (0.270) (0.247) (0.417) (0.128) 


ln(WF) 


0.763 0.353 0.273 0.218 0.759 0.319 0.085 0.219 




(0.210) (0.194) (0.378) (0.140) (0.222) (0.203) (0.376) (0.147) 


\n{RDC) 


0.081 0.611 0.717 0.631 0.078 0.580 0.665 0.608 




(0.122) (0.091) (0.120) (0.080) (0.128) (0.105) (0.148) (0.089) 


ln(a/) 


-0.174 -0.017 -0.564 -0.464 -0.184 0.026 -0.563 -0.451 




(0.090) (0.091) (0.131) (0.110) (0.101) (0.098) (0.150) (0.119) 




1.000 




(0.000) 




0.072 1.000 




(0.070) (0.000) 


a 


0.119 0.313 1.000 




(0.075) (0.053) (0.000) 




-0.080 0.225 0.115 1.000 




(0.074) (0.063) (0.080) (0.000) 



The results in Table 15.21 largely accord with economic theory. Of particular in- 
terest is the fact that in all cases the coefficients on \n(RDC) are positive, and for 
CHS, CP, and IS, they are also economically and statistically significant. Specifi- 
cally, those point estimates are relatively large in magnitude and lie more than 1 .96 
standard errors away from zero, which is the 5% critical value for a two-sided test 
under asymptotic normality. This indicates that innovation in those categories is 
capital-intensive and the stock of R&D capital is a key determinant of patenting ac- 
tivity. The results also suggest that, all else being equal, the introduction of patents 
tends to be done by large firms, as measured by the size of the company workforce 
ln(WF). The coefficient on that variable in the communications category is large 
and statistically significant, whereas in the other three categories the estimates are 
positive but not significant at the customary significance levels. Interestingly, and 
perhaps counter-intuitively, the coefficients on \n(SALES) in these categories are 
predominantly negative (with the exception of computer peripherals), and none are 
statistically significant. To explain this puzzling finding, economists have proposed 
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a rationalization that has to do with signaling in the presence of asymmetric infor- 
mation. In particular, firms that do not have steady sales revenue such as start-ups 
that have yet to establish a reliable customer base, are often cash constrained and 
may have to demonstrate their creditworthiness to potential lenders such as venture 
capitalists, banks, and individual investors in order to obtain loans. One way for 
such firms to exhibit their research innovations and overall productivity is to regis- 
ter patents. In this case patents serve a dual role - they protect the firm's innovations 
from infringement and also send a positive signal to potential outside stakehold- 
ers. In contrast, firms that have more reliable sources of revenue due to higher sales 
have lower incentives to patent their innovations and may instead opt to protect their 
intellectual property in other ways (e.g. by keeping trade secrets, entering into ex- 
clusive agreements with potential users of their technology, etc.). These considera- 
tions are especially relevant in the computers & instruments industry, where patents 
have short life cycles and can often be circumvented by competitors who "innovate 
around" registered research advances. 

Table 15.21 also illustrates that over-dispersion is a common feature of all four 
data series, as demonstrated by the low estimates of {ln(a ; )} across all categories 
in both the copula and univariate regression models. As a consequence, allowing 
for over-dispersion by considering a negative binomial specification, as opposed to 
estimating a Poisson model, appears well justified. 

In Table 15.21 the estimated dependence matrix Q in the Gaussian copula model 
reveals interesting complementarities among patent categories and supports the case 
for joint modeling and estimation. Specifically, the estimates suggest that patents in 
the computer hardware & software category are highly correlated with counts in the 
computer peripherals and information storage categories, while the correlation be- 
tween patents in the communications category are relatively mildly correlated with 
those in the remaining categories. To test formally for the relevance of the copula 
correlation structure in this context, one can use the likelihood ratio and Lagrange 
multiplier tests. The log-likelihood for the restricted model (the independent nega- 
tive binomial specification) is Lr = —4050.06 and for the unrestricted model (Gaus- 
sian copula model), it is Ly = —4020.98, leading to a likelihood ratio test statistic 
—2{Lr — Lu) = 58.16. This test statistic has a % 2 distribution with 6 degrees of free- 
dom (equal to the number of off-diagonal elements in Q) and a 5% critical value of 
12.59, suggesting that the data strongly reject the restricted (independent negative 
binomial) specification. The Lagrange multiplier test statistic is constructed from 
the gradient g R = dlnf(y\ WR )/d W and curvature B R = ZU dlnf ^ w) d]n %! VR) 
of the log-likelihood function of the Gaussian copula model, both evaluated at the 
restricted maximum likelihood estimate \j/ R . Note that this corresponds to the case 
when Q is an identity matrix and the Gaussian copula model is equivalent to fitting 
the four negative binomial models separately (in fact, these are the starting values 
we use in optimizing the copula log-likelihood). The Lagrange multiplier test statis- 
tic LM — {g' R (B R )~ l g R } — 64.58 has the same asymptotic % 2 distribution as the 
likelihood ratio statistic and also leads to strong rejection of the restricted model. 

The parameter estimates and hypothesis tests presented above are based on 
maximizing an MCMC-based estimate of the log-likelihood function because that 
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Fig. 5.2 Numerical standard errors (NSE) of the log-likelihood estimate as a function of the 
MCMC sample size in the CRT method (the axes, but not the values, are on the logarithmic 
scale) 



function is analytically intractable. However, because the variability intrinsic in 
simulation-based estimation can affect the reliability of the results, it is important to 
examine the extent to which the point estimates, standard errors, and test statistics 
are affected by the performance of the simulated likelihood algorithm. In Figure \5?2\ 
we have plotted the numerical standard error of the estimated log-likelihood ordi- 
nate ln/(y| \j/) as a function of the simulation sample size G used in constructing 
the average in !5.14l The numerical standard error gives a measure of the variability 
of the estimated log-likelihood ordinate for fixed y and 1// if the simulation nec- 
essary to evaluate ln/(y|i//) were to be repeated using a new Markov chain. The 
Figure demonstrates that the simulated likelihood algorithm performs very well - 
even with few MCMC draws, the precision of estimating \nf(y\\jf) is sufficient for 
producing reliable parameter estimates and hypothesis tests. The low variability of 
In f(y\\f/) in our example is especially impressive because the numerical standard 
errors are obtained as the square root of the sum of variances of the n = 498 individ- 
ual log-likelihood contributions. To be cautious, we have also verified the validity of 
the point estimates by initializing the algorithm at different starting values and also 
by estimating the model by Bayesian simulation methods similar to those proposed 
in ifTUl and iTBl . which do not rely on maximizing the log-likelihood. 

At the end of Section l5.3.2l we discussed the possibility that in maximum simu- 
lated likelihood estimation the standard errors may be affected by simulation noise. 
To examine the extent to which variability in the log-likelihood estimate translates 
to downward biases in the standard errors of the parameter estimates, we com- 
pute the standard errors across several settings of the simulation size G, namely 
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Fig. 5.3 Boxplots of the ratios of parameter standard errors estimated for each MCMC sam- 
ple size setting G relative to those for G = 1500; the lines in the boxes mark the quartiles, the 
whiskers extend to values within 1.5 times the interquartile range, and outliers are displayed 

by "+" 



G G {25,50, 100,500, 1500}. We then compare the behavior of the standard errors 
for lower values of G relative to those for large G. Figure 1531 presents boxplots of 
the ratios of the parameter standard errors estimated for each setting of G relative 
to those at the highest value G = 1500. The results suggest that while at lower val- 
ues of G the standard error estimates are somewhat more volatile than at G = 1500, 
neither the volatility nor the possible downward bias in the estimates represents a 
significant concern. Because the CRT method produces very efficient estimates of 
the log-likelihood ordinate, such issues are not problematic even with small MCMC 
samples, although in practice G should be set conservatively high, subject to one's 
computational budget. 



5.5 Concluding Remarks 

This chapter has discussed techniques for obtaining maximum simulated likelihood 
estimates in the context of models for discrete data, where the likelihood function 
is obtained by MCMC simulation methods. These methods provide continuous and 
differentiable estimates that enable the application of widely used derivative-based 
techniques for obtaining parameter standard errors and test statistics. Because we 
are maximizing a log-likelihood function, we rely on the BHHH outer product of 
gradients method to simplify and speed up the computation of the Hessian matrix 
of the log-likelihood. The methodology is applied in a study of the joint behavior of 
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four categories of U.S. technology patents using a Gaussian copula model for multi- 
variate count data. The results support the case for joint modeling and estimation of 
the patent categories and suggest that the estimation techniques perform very well in 
practice. Additionally, the CRT estimates of the log-likelihood function are very ef- 
ficient and produce reliable parameter estimates, standard errors, and hypothesis test 
statistics, mitigating any potential problems (discussed at the end of Section [5. 3. 2b 
that could arise due to maximizing a simulation-based estimate of the log-likelihood 
function. 

We note that the simulated likelihood methods discussed here can be applied in 
optimization algorithms that do not require differentiation, for example in simulated 
annealing and metaheuristic algorithms which are carefully examined and summa- 
rized in l22ll . At present, however, due to the computational intensity of evaluating 
the log-likelihood function at each value of the parameters, algorithms that require 
numerous evaluations of the objective function can be very time consuming, espe- 
cially if standard errors have to be computed by bootstrapping. Nonetheless, the 
application of such algorithms is an important new frontier in maximum simulated 
likelihood estimation. 
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Chapter 6 

Optimizing Complex Multi-location Inventory 

Models Using Particle Swarm Optimization 

Christian A. Hochmuth, Jorg Lassig, and Stefanie Thiem 



Abstract. The efficient control of logistics systems is a complicated task. Analyti- 
cal models allow to estimate the effect of certain policies. However, they necessitate 
the introduction of simplifying assumptions, and therefore, their scope is limited. 
To surmount these restrictions, we use Simulation Optimization by coupling a sim- 
ulator that evaluates the performance of the system with an optimizer. This idea 
is illustrated for a very general class of multi-location inventory models with lat- 
eral transshipments. We discuss the characteristics of such models and introduce 
Particle Swarm Optimization for their optimization. Experimental studies show the 
applicability of this approach. 



6.1 Introduction 

Reducing cost and improving service is the key to success in a competitive economic 
climate. Although these objectives seem contradictory, there is a way to achieve 
them. Spreading of service locations improves service and pooling of resources 
can decrease cost if lateral transshipments are allowed between the locations. 
The design and control of such multi-location systems is an important non-trivial 
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task. Therefore, we need suitable mathematical models - Multi-Location Inventory 
Models with lateral Transshipments (MLIMT) - to describe the following situation 
11411 . A given number of locations has to meet a demand for some products during 
a defined planning horizon. Each location can replenish its stock either by order- 
ing from an outside supplier or by transshipments from other locations. The prob- 
lem arises to define such ordering and transshipment decisions (OD and TD) that 
optimize defined performance measures for the whole system. 

The MLIMT presented here has been developed with respect to three aspects. 
First, we assume discrete review for ordering in conjunction with continuous review 
for transshipments. Second, we propose an MLIMT simulator as general as possi- 
ble to abandon restrictions of existing studies (see Section loT2T > and to ensure broad 
applicability. Third, we follow a Simulation Optimization (SO) approach by itera- 
tively connecting the MLIMT simulator with a Particle Swarm Optimization (PSO) 
algorithm to investigate the search space. Hence, we contribute to the application of 
SO by describing a multi-location inventory model that is far more general, and thus, 
more complex than preceding models. We show that it is possible to evaluate, and 
thus, to optimize policies for arbitrary demand processes as well as arbitrary order- 
ing, demand satisfaction, pooling and transshipment modes. In fact, SO provides the 
solution for the optimal control of complex logistics networks. Moreover, we con- 
tribute to the methodology of SO by integrating a PSO algorithm for this specific ap- 
plication. So far, PSO has been implemented for a Single-Warehouse, Multi-Retailer 
system by Kochel and Thiem [ 23 ] , and we aim to show its applicability for even more 
complex logistics networks. 

After a discussion of related work in Section |6\2| and a brief introduction of the 
SO approach in Section 16.31 we present the characteristics of a general MLIMT 
and delve into the implemented simulation model in Section [6~4l In Section [631 we 
describe the applied PSO in detail. Experimental studies are discussed in Section 
16.61 followed by concluding remarks in Section [6?7l 



6.2 Related Work 

At present a great variety of models and approaches exists dealing with this deci- 
sion problem. The most common and broadest investigated class of models assumes 
a single product, discrete review, independent and identically distributed demand, 
backlogging, complete pooling, emergency lateral transshipments at the end of a 
period, zero lead times, linear cost functions, and the total expected cost criterion as 
performance measure (see Kochel [19], Chiou [3] for a review). However, MLIMTs 
generally do not allow analytical solutions due to transshipments. TDs change the 
state of the system and thereby influence the OD. Thus it is impossible to define 
the total consequences of an OD. Approximate models and simulations are alterna- 
tives, see e.g., Kochel il8lll9ll . Robinson [30]. Additional problems connected with 
TDs arise for continuous review models. One is to prevent undesirable forth-and- 
back transshipments. This is narrowly connected with the problem to forecast the 
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demand during the transshipment time and the time interval elapsing from the re- 
lease moment of a TD until the next order quantity will arrive. Therefore, continu- 
ous review MLIMTs are usually investigated under several simplifying assumptions, 
e.g., two locations 171. 13311 . Poisson demand [25], a fixed ordering policy not consid- 
ering future transshipments [27], restriction to simple rules such as a one-for-one 
ordering policy [25] and an all-or-nothing transshipment policy [7], or the limita- 
tion that at most one transshipment with negligible time and a single shipping point 
during an order cycle is possible [33]. Nowhere the question for optimal ordering 
and transshipment policies has been answered. All models work with a given order- 
ing policy and heuristic transshipment rules. In few cases simulation is used either 



for testing approximate analytical model s 11271,13311 or for the definition of the best re- 
order point s for a (s, S) -ordering policy [24] by linear search and simulation. Thus, 
the investigations are restricted to small-size models. Herer et al. [12] calculates 
optimal order-up-to levels S using a sample-path-based optimization procedure and 
subsequently finds optimal transshipment policies for given ordering policies apply- 
ing linear programming to a network flow model. Extensions include finite trans- 



portation capacities [28] and positive replenishment lead times I1Q1 . Furthermore, 
we investigate the effect of non-stationary transshipment policies under continuous 
review. Thus, the complexity of this general model motivates the application of 
simulation-based optimization with PSO instead of gradient-based methods. In 
this regard, we follow an approach similar to Arnold et al. [ID, and recently Bel- 
gasmi et al. |2|], who analyze the effect of different parameters using evolutionary 
optimization. 



6.3 Simulation Optimization 

The Simulation Optimization (SO) approach is well known in the field of In- 
dustrial Engineering. Its key advantage is that various performance measures can 
be optimized for in fact arbitrary models. Among many others, Guariso et al. 



1 1111 . Willis and Jones B32H . Iassinovski et al. |15] introduce comprehensive SO 
frameworks. Some notable examples are finding optimal order policies [alii]* se- 
quencing and lot-sizing in production [16], production planning and control in 
remanufacturing [26], and optimizing multi-echelon inventory systems ||22|| . For 
a review of SO in general and with regard to inventory problems see Kochel 

In general SO comes in two distinguishable flavors. Non-iterative (non-recursive 
or retrospective) SO decouples simulation and optimization, while iterative (recur- 
sive or prospective) SO integrates both functional components into a self-adapting 
search method. For an overview on SO approaches we refer to Fu et al. [9] and 
Kochel II20|] . In case of non-iterative SO, the objective function of the model is es- 
timated by simulation prior to the optimization. Thus, to cover the search space 
in sufficient accuracy, extensive simulation is necessary, especially if the objec- 
tive function is unknown. In contrast, in case of iterative SO, simulation is used 
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to evaluate actual solution candidates, and therefore, simulation is adapted to the 
current state of the search. The general idea of iterative SO is outlined in Figure loTTl 
For a given decision problem an optimizer proposes candidate solutions. Using the 
results of simulation experiments, the performance of these candidate solutions is 
estimated. On the basis of the estimated performance, the optimizer decides to ac- 
cept or reject the current decisions. Acceptance stops the search process whereas 
rejection continues it. 



Problem 



Solution 



Optimizer 



Candidate 
solutions 



Performance 
analysis 



Simulator 



Fig. 6.1 Scheme for the iterative Simulation Optimization approach. The optimizer proposes 
new candidate solutions for the given problem, whose performance are estimated by simula- 
tion experiments. Depending on the estimated performance, the optimizer decides to either 
accept the current decisions and stop the search process or to reject it and to continue 



As seen from Figure I6TT1 iterative SO is based upon two main elements - a sim- 
ulator for the system to be investigated and an optimizer that finds acceptable solu- 
tions. Generic simulators and optimizers are compatible, and thus SO is suited for 
the solution of arbitrary complex optimization problems. In the past differe nt appli- 
cations of the outlined approach especially to inventory problems P2U I22J llfjl 12011 
have been implemented. In most cases Genetic Algorithms (GA) have been applied 
so far. But just as GAs also Particle Swarm Optimization (PSO) is in fact suitable for 
very general optimization problems. Contrary to gradient-based approaches, local 
optima can be left. Hence, these methods are predestined for unknown or compli- 
cated fitness landscapes. However, it is not guaranteed to find the global optimum, 
but a very good solution is usually returned in reasonable time. Furthermore, they 
rely only on a small amount of information and can be designed independently from 
the application domain. It will be interesting to see if PSO deals as excellently with 
the random output of stochastic simulation as GAs. 
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6.4 Multi-Location Inventory Models with Lateral 
Transshipments 

6.4.1 Features of a General Model 

A simulation model can in principle represent any real system with arbitrary accu- 
racy. However, our objective is not to over-size a simulation model for all possible 
inventory systems. Instead we develop a simulation model capable of evaluating so- 
lutions for an important class of systems in reasonable time. First, we describe such 
a general Multi-Location Inventory Model with Lateral Transshipments (MLIMT). 
The visible complexity of an MLIMT is of course determined by its number of 
locations N. With respect to the analytical tractability the cases N = 2 and N > 2 
can be distinguished. Thus, the limitations of analytical models are obvious, and 
in order to solve real-world problems, it is crucial to surmount this restriction. The 
general case is illustrated in Figure [6721 



Orders Demand 



2 * 



N 



Transshipments 



Fig. 6.2 Logical view of a general Multi-Location Inventory Model with Lateral Transship- 
ments (MLIMT). Each of the N locations may refill its stock either by ordering from an 
outside supplier or by transshipments from other locations in order to meet the demand. The 
locations are on an equal level without any predefined structure 



Each of the N locations faces a certain demand for a single product or a finite 
number of products. In the latter case a substitution order between products may be 
defined. Most approaches assume a single product. For the consideration of multiple 
products, sequential simulation and optimization is feasible, unless fixed cost or 
finite resources are shared among products. However, this limitation is negligible 
provided that shared fixed cost is insignificant relative to total fixed and variable 
cost, and capacities for storage and transportation are considered to be infinite. 

The ordering mode defines when to order, i.e., the review scheme, and what 
ordering policy to use. The review scheme defines the time moments for order- 
ing. Discrete and continuous review are the alternatives. Under the discrete review 
scheme the planning horizon is divided into periods. Usually the ordering policy 
is defined by its type and corresponding parameters (e.g., order-up-to, one-for-one, 
(s,S),(R,Q)). 

Central to the model specification is the definition of the demand process. It 
may be deterministic or random, identical or different for all locations, station- 
ary or non- stationary in time, independent or dependent across locations and time, 
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with complete or incomplete information. To draw a reliable picture of the system 
performance, it is crucial to assume a realistic demand process, and thus, to track 
and to extrapolate orders in a real-word system. 

Arriving demand is handled at each individual location according to the demand 
satisfaction mode. It is common to assume a queue for waiting demand. By defin- 
ing an infinite, finite or zero queueing capacity, the backlogging, intermediate and 
lost-sales cases are distinguished. The service policy defines at what position an ar- 
riving client is enqueued. Eventually, clients may leave the location after a random 
impatient time if their demand is not or only partially served. 

Clearly, it may be advantageous to balance excess and shortage among locations 
to serve demand that would otherwise be lost. The pooling mode comprises all rules 
by which the on-hand inventory is used to respond to shortages in the MLIMT. 
Pooling may be complete or partial and defines which locations and what amount 
of available product units is pooled. Furthermore, we distinguish between pooling 
of stocks and pooling of residuals. 

Still, it must be defined by the transshipment mode when to transship and what 
transshipment policy to use. There may he preventive lateral transshipments to antic- 
ipate a stock-out or emergency lateral transshipments after a stock-out is observed. 
E.g., preventive lateral transshipments may be allowed at the beginning of a period 
in a discrete review model, i.e., before demand realization, or at a given moment dur- 
ing a period, i.e., after partially realized demand. Emergency lateral transshipments 
are usually allowed at the end of a period after realization of the demand. 

Transshipments are especially reasonable if the transshipment lead time is prac- 
tically negligible. In general, lead times for orders and transshipments of product 
units may be positive constants or random. Again, a distinction with respect to an- 
alytical tractability can be made. Although the effect of lead times for real-world 
systems may be pronounced, many analytical models assume zero lead times. 

To measure the system performance, cost and gain functions are defined. There 
may incur cost for ordering, storing and transhipping product units as well as for 
waiting and lost demand. These functions may be linear, linear with set-up part, 
or generally non-linear. A location may also earn gain from sold units. The cost is 
tracked in a certain planning horizon, which may be finite or infinite. In case of 
periodic review it may consist of a single period. 

Finally, as optimization criterion various cost criteria can be used such as total 
expected cost, total expected discounted cost, long-run average cost, and non-cost 
criteria such as service rates or expected waiting times. Both criteria types can be 
combined to formulate a multi-objective problem. Alternatively, one criterion is op- 
timized while given restrictions must be satisfied for others, or different aspects such 
as service are represented by cost functions, e.g., out-of-stock cost. 

The most common class of models defines a single product, discrete review, de- 
mand independent and identically distributed across time, backlogging, complete 
pooling, emergency lateral transshipments at the end of a period, zero lead times, 
linear cost functions, and a total expected cost criterion. However, even for that sim- 
ple type of an MLIMT an analytical solution is not possible in general due to the 
transshipments at the end of a period. Potential transshipments had to be taken into 
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account for the ordering decision at the beginning of a period. But after the demand 
realization at the end of a period the optimal transshipment decision results from 
an open (linear) transportation problem. Such problems do not have closed form 
solutions. Therefore, prior to the demand realization no expression is available for 
the cost savings from transshipments. Both approximate models and simulation are 
potential solutions, e.g., Kochel I18lll9f| and Robinson 113011 . 

6.4.2 Features of the Simulation Model 

The simulation model offers features that allow the mapping of very general sit- 
uations. The simulator is in principle suited for models with an arbitrary number 
of independent non-homogeneous locations, a single product, constant location- 
dependent delivery and transshipment lead times, and unlimited transportation re- 
sources. The most important extensions of existing models are the following ones. 
With regard to the ordering mode, we assume a periodic review scheme with 
fixed length tp t of the review period for orders at location ;'. In principle arbitrary 
ordering policies can be realized within the simulation model and so far (si,Si)- and 
(sj,nQj)- ordering policies have been implemented. 
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Fig. 6.3 (st , S, ) -ordering policy. If the inventory position r,- of location i drops below the reorder 
point s/ at the end of an order period, an order is released to the order-up-to level 5;, i.e., 5; — r; 
product units are ordered. Analogously, but under continuous review, a transshipment order 
(TO) of Hj — fyo,i(t) product units is released, if the state function /to.z'M falls below hj 



Clients arrive at the locations according to a compound renewal demand process. 
Such a process is described by two independent random variables 7] and B, for the 
inter-arrival time of clients at location i and their demand, i = 1 . . .N, respectively. 
Thus, exact holding and penalty cost can be calculated, which is not the case for 
models with discrete review, where the whole demand of a period is transferred to 
the end of a period. That disadvantage does not exist for models with continuous 
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review. However, in almost all such models a Poisson demand process is assumed - 
a strong restriction as well. 

Concerning the demand satisfaction mode, most models assume the back-order 
or the lost-sales cases. An arriving client is enqueued according to a specific ser- 
vice policy, such as First-In-First-Out (FIFO) and Last-In-First-Out (LIFO), sorting 
clients by their arrival time, Smallest-Amount-Next (SAN) and Biggest-Amount- 
Next (BAN), sorting clients by their unserved demand, and Earliest-Deadline-First 
(EDF). In addition, a random impatient time is realized for each client. 

To balance excess and shortage, the simulation model permits all pooling modes 
from complete to time-dependent partial pooling. A symmetric N x N matrix P = 
(pij) defines pooling groups in such a way that two locations i and j belong to the 
same group if and only if /? i; = 1, pij = otherwise. The following reflection is 
crucial. Transshipments allow the fast elimination of shortages, but near to the end 
of an order period transshipments may be less advantageous. Therefore, a parameter 
fpooy € [0, tpj] is defined for each location ;'. After the k th order request, location i 
can get transshipments from all other locations as long as for the actual time t < 
ktpj + ? P ooi,( holds. For all other times location ; can receive transshipments only 
from locations that are in the same pooling group. Thus, the transshipment policies 
become non- stationary in time. 

Transshipments are in fact in the spotlight of this chapter. Regarding the transship- 
ment mode, our simulation model allows transshipments at any time during an order 
cycle (continuous review) as well as multiple shipping points and partial deliveries 
to realize a transshipment decision (TD). To answer the question when to transship 
what amount between which locations, a great variety of rules can be defined. Broad 
applicability is achieved by three main ideas -priorities, introduction of a state func- 
tion and generalization of common transshipment rules. Difficulties are caused by 
the problem to calculate the effects of a TD. Therefore, TDs should be based on ap- 
propriate forecasts for the dynamics of the model, especially the stock levels. The 
MLIMT simulator offers several possibilities. For each location transshipment orders 
(TO) and product offers (PO) are distinguished. Times for TOs or POs are the arrival 
times of clients or deliveries, respectively. Priorities are used to define the sequence of 
transshipments in one-to-many and many-to-one situations. Because of continuous 
time only such situations occur, and thus, all possible cases are considered. The three 
rules, Biggest- Amount-Next (BAN), Minimal-Transshipment-Cost per unit (MTC) 
and Minimal-Transshipment-Time (MTT) may be combined arbitrarily. State func- 
tions are used to decide when to release a TO or PO. The following variables for each 
location i and time t > are used in further statements: 

yt(t) Inventory level 

yf(t) = max(±v,'(?),0) On-hand stock (+) and shortage (— ), respectively 

b rd,i(t) Product units ordered but not yet delivered 

2> on i & Product units ordered in the fc-th request 

btrj(t) Transshipments on the way to location i 

r i(t) = yt(t) + b id,i(t) + W t i(t) Inventory position 
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tpi Order period time 

?A,( Delivery lead time of an order 

w ord,; = L f A,; Ap.iJ Number of periods to deliver an order 

To decide at time t in location i about a TO or PO, the state functions froj(t) 
and fpoj(t) are defined based on the available stock plus expected transshipments 
fro.i(t) —yi(t) +btr,i(t) and the on-hand stock fpoi(t) = y?(t), respectively. Since 
fixed cost components for transshipments are feasible, a heuristic (hi, H i) -rule for 
TOs is suggested in the following way, which is inspired by the (■?,-, St) -rule for or- 
der requests (/z, < Hi). 

Kfro,i(t)<hi 

release a TO for //, — fjoj(t) product units. 

However, in case of positive transshipment times it may be advantageous to take 
future demand into account. Thus, a TO is released on the basis of a forecast of 
the state function fyoj(t') for a time moment t' > t, and the transshipment poli- 
cies become non- stationary in time. The MLIMT simulator offers three such time 
moments: t' = t, the current time (i.e., no forecast), t' = t\, the next order review 
moment, and t ' = ti, the next potential moment of an order supply. For instance, the 
state function /to, ((0 =y,(?)+frtr,(( f )> ' — is considered. Let&rp,,- < t < (k + l)?p,;, 
i.e., we assume that we are in the review period after the k th order request. Then t\ 
is defined as follows. 

tl = (k+l)t PJ . (6.1) 

For t2 we introduce two events ev(t) ^ {in the actual period there has not been an 
order supply until t} and ev(t ) <-> {there has been an order supply until t }. 

t (b „ \t -L, J_/ ev ( t )^ t <( k - n ord,i)tp.j + t A .,i ,, y 

t2 = (k-n ordJ )t P j + t A j+< _ . (6.2) 

[fpi evit ) <r->t>(k- «ord,i) f P,! + ( A.i 

Using rrij — (B,}/(7]) as long-run demand per time unit at location ;', the following 
forecasts are used, illustrated in Figures [63 and [ 



Ao,K0 =fro,i(t) =y*W+M0 . ( 6 - 3 ) 

ho.Ah) = ho,i{t) - m t (h -t) + < n _, , , (6.4) 

10 ev(t) 

hoAti) = ho.i{t) - m{t2 - 1) . (6.5) 

Thus, replacing function /to.KO ^Y various forecast functions, a great variety of 
rules can be described to control the release of TOs. We remark that in case of lin- 
ear transshipment cost functions without set-up part the (/;,■, 7/,)-rule degenerates 
to (Hi, Hj). A well-designed optimization algorithm will approximate that solution. 
Therefore, we work generally with the (/z,, //,)-rule. To serve a TO, at least one loca- 
tion has to offer some product quantity. To decide when to offer what amount, an ad- 
ditional control parameter is introduced - the offering level o,-, corresponding to the 
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Fig. 6.4 Forecast functions for ev(t) ^t< (k — n rd,/) f P,i+ f A,i- In the actual period there has 
not been an order supply until t . Thus, the time moment tj of the next order supply k — n on j ,- 
is in the current period, and the supplied amount must be considered to forecast /,(?[ ) 
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Fig. 6.5 Forecast functions for ev(t) ^ t > (k— « rd,;) f P,; + ? A,i- In the actual period there 
has been an order supply until t, and thus, the time moment tj of the next order supply 
k+l— n or d,( is in the next period, not affecting fi{t\ ) 



hold-back level introduced by Xu et al. 13 3H . Since only on-hand stock can be trans- 
shipped, the state function /po,,'(?) —y^(t) is defined. The offered amount yf(t) —Oi 
must not be smaller than a certain value Ao m i n j to prevent undesirably small and 
frequent transshipments. Similar forecasts are applied to take future demand into 
account with forecast moments t, t\, and t%. For details we refer to Hochmuth 11311 . 
Thus, the PO rule is as follows. 

H fpoj{t) ~ oj > Ao minJ 

release a PO for fpoj(t) — o, product units. 
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Thus, the set of available transshipment policies is extended, including all com- 
monly used policies, and allowing multiple shipping points with partial deliveries. 

In order to measure the system performance by cost and gain functions, order, 
holding, shortage (waiting) and transshipment cost functions may consist of fixed 
values, components linear in time, and components linear in time and units. Fixed 
cost arises from each non-served demand unit. All cost values are location-related. 
The gain from a unit, sold by any location, is a constant. To track cost for infi- 
nite planning horizons, appropriate approximations must be used. The only problem 
with respect to finite horizons is the increase in computing time to get a sufficiently 
accurate estimate, although the extent can be limited using parallelization. 

Choosing cost function components in a specific way, cost criteria as well as 
non-cost criteria can be used, e.g., the average ratio of customers experiencing a 
stock-out or the average queue time measured by out-of-stock cost, or the efficiency 
of logistics, indicated by order and transshipment cost. 



6.5 Particle Swarm Optimization 

Particle Swarm Optimization (PSO), originally proposed by Kennedy and Eberhart 
[17], has been successfully applied to many real-world optimization problems in 
recent years 1134. la. 12911 . PSO uses the dynamics of swarms to find solutions to opti- 
mization problems with continuous solution space. Meanwhile, many different ver- 
sions and additional heuristics were introduced, where we restrict our considerations 
here to the Standard PSO 2007 algorithm by Clerc 0]. 

PSO is, similar to Genetic Algorithms or ensemble-based approaches 13111 . an 
iterative population-based approach, i.e., PSO works with a set of feasible solutions, 
the swarm. Let N denote the number of swarm individuals (particles) or the swarm 
size, respectively. The basic idea of PSO is that all swarm individuals move partly 
randomly through the solution space S? . Thereby individuals can share information 
about their so far best previous position r bsf , where each particle has a number of K 
informants. Additionally, each individual i has an internal memory to store its best 
so far (locally best) solution rj bsf . In every iteration the movement of each individual 
beginning from its actual position r, is then given by a trade off between its current 
velocity v,-, a movement in the direction of its locally best solution r| bsf (cognitive 
component) and of its so far best known solution r bsf of its informants in the swarm 
(social component). Thus, the equations of motion for one individual i and a discrete 
time parameter t are given by 



v' +1 =w-\' i +c l 



^■(rf st, -^+c 2 -^ 2 -{rf'-r' l ) (6.6) 

r'^ 1 =r' i +\' i +l . (6.7) 

The diagonal matrices £%\ and £%t contain uniform random numbers in [0, 1) and 
thus randomly weight each component of the connecting vector (r| bsf — r,) from 
the current position r,- to the locally best solutions rj bsf . The vector (r bsf — r,) is 
treated analogously. Since every component is multiplied with a different random 
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number, the vectors are not only changed in length but also perturbed from their 
original direction. The new position follows then from the superposition of the three 
vectors. By choosing the cognitive parameter c\ and the social parameter c%, the 
influence of these two components can be adjusted. The Standard PSO 2007 setup 
uses c\ —C2 — 0.5 + ln(2) ~ 1.193 and w= 1/(2 In 2) ~ 0.721 as proposed by Clerc 
and Kennedy J5[] . A number of N — 1 00 particles is chosen with a number of K = 3 
informants. 

The pseudo-code of the solution update for the swarm is shown in AlgorithmQ] 
The position and the velocity components for the different particles i and dimension 
d are written as subscripts, i.e., Va is the fif-th component of the velocity vector v, of 
particle i. The iterative solution update of the vector V; is visualized in Figure [6761 



Algorithm 1: Position and velocity update rule in PSO 

Data: position r,-, velocity v,-, locally best position r{ and globally best position r^ 

for each particle ;', cognitive parameter c\ and social parameter C2- 
Result: {r,} with r,- e 5? and updated velocities {v,}. 
l begin 

forall particles i do 

forall dimensions d do 

R\ *— geLunifonru-andom_number(0, 1); 
R2 *— get_uniform_random_number(0, 1); 



Vid «- W ■ Vi4 +Ci-R V (rS Sf - 'Id) + Cl ■ R 2 ■ (r™ - r id ) 



return {r,},{v,}; 
10 end 



6.6 Experimentation 
6.6.1 System Setup 

Using the MLIMT simulator in combination with PSO, different scenarios are opti- 
mized for the five-location model shown in Figure [6~7l We are particularly interested 
in the effect of the following three factors that are tested in combination. 

First, we test the two service policies First-In-First-Out (FIFO) and Earliest- 
Deadline-First (EDF). Second, for the transshipment orders (TOs) we either monitor 
the current time, i.e., t' — t, or we use forecasting with t' = t\, the next order review 
moment. Third, for the pooling strategy two different policies are applied. For the 
first option, the stocks of all locations i are completely pooled, i.e., f poo i.i = tp,i- Thus, 
only ordering and transshipment policies are optimized, not pooling. For the second 
option, only locations next to each other in Fi sure [6771 are in one pooling group, i.e., 
there are four different pooling groups. Therefore, lateral transshipments are limited 
to adjacent locations at the end of an order period. 
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Fig. 6.6 Iterative solution update of PSO in two dimensions. From the current particle posi- 
tion r' the new position r' + is obtained by vector addition of the velocity, the cognitive and 
the social component 




Fig. 6.7 Topology of the five-location model. An edge between two locations indicates that 
these belong to the same pooling group, i.e., lateral transshipments are feasible at all times. 
Along the edges distances in kilometers are shown 



The other parameters are identical for all optimization runs. All locations i use 
an (si, S, ;) -ordering policy, where the initial reorder points are s, = and the initial 
order-up-to levels are Si = 600, S 2 = 900, S 3 = 1 , 200, S 4 = 1 , 500 and S 5 = 1 , 800. 
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For all locations i an order period is equal to 10 days, i.e., fp , = 10 days. The 
distances between all locations are visualized in Figure 16.71 and the transship- 
ment velocity is chosen to be 50.00 km/h. The state function chosen for TOs is 
fro.i (t) = Vi (t) + b h j (t) and for POs /po,; (t ) = )y (t) . To analyze the effect of fore- 
casting demand for ordering transshipments, all combinations of the current time t 
and the forecast moment t\ are compared for TOs. For offering product units, the 
current time t is used. The priority sequence for TOs and POs is MTC, BAN, MTT 

The inter-arrival time of customers to a location i is an exponentially distributed 
random variable with (7}) = 2h. The impatient time is triangularly distributed in the 
interval (0h,8h), i.e., (Wj) = 4h. The customer demand is for all locations i uni- 
formly distributed in [0,B max ] but with different maximum values, i.e., 5 max ,i = 10, 
#max.2 = 15, #max ,3 = 20, -B ma x,4 = 25, -B max ,5 = 30, respectively. The initial inven- 
tory of the five locations i is chosen to be 7 star t,i = 600, 7 s tait.2 = 900, / s tart.3 = 1 , 200, 
^start,4 = 1,500, / s tart,5 = 1,800, respectively. However, the initialization values will 
not have an influence after the transition time. The maximum capacity of the storage 
is 10,000 product units for each location. The regular order delivery times at the end 
of each period are location dependent as well. For location i the times are t\ =2.0 d, 
t 2 = 2.5 d, r 3 = 3.0 d, d 4 = 3.5 d and t 5 = 4.0 d. 

The cost for storing product units is 1 .00 € per unit and day, whereas the or- 
der and transshipment cost is 1 .00 € per unit and per day transportation time. The 
fixed transshipment cost is 10.00 € for each location and the gain per unit sold is 
100.00 €. The out-of-stock cost per product unit and waiting time are 1.00 €/h and 
the out-of-stock cost for a canceling customer is 50.00 €. The fixed cost for each 
periodic order is 500.00 €, and the order cost per product unit and day is 1.00 €. 
The optimization criterion is the minimum total cost expected. 

The simulation time is 468 weeks plus an additional transition time of 52 weeks 
in the beginning. For optimization we use PSO with a population of 100 individuals 
i, where an individual is a candidate solution r,, i.e., a real-valued vector of the 
following policy parameters for each location ;'. 

si Reorder point for periodic orders 

Si Order-up-to level for periodic orders 

hi Reorder point for transshipment orders 

Hj Order-up-level for transshipment orders 

o, Offer level 

4o m in,, Minimum offer quantity 

fpooi,i Pooling time 

The optimization stops if a new optimum has not occurred for the last 2,000 cy- 
cles, but at least 10,000 cycles must be realized to prevent early convergence in 
a local optimum. On machines with two dual-core Opteron 270 2GHz processors, 
one iteration consumes about 15 seconds runtime. For all experiments total results, 
optimized parameter values and cost function values are determined. After the op- 
timization the minimum absolute values of all parameters not changing the cost 
function values are determined using a binary search. 
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6. 6.2 Results and Discussion 



Tables 16.11 and 16.21 show the total results for the service policies FIFO and EDF, 
respectively. For each service policy all combinations of forecasting demand for 
transshipment requests and assigning pooling groups are evaluated. Solutions 1 and 
2 monitor the current net inventory level for deciding on transshipment orders, while 
solutions 3 and 4 forecast demand to the time of the next order. Solutions 1 and 3 
allow transshipments between all locations, while solutions 2 and 4 confine trans- 
shipments to adjacent locations at the end of an order period. For each solution total 
cost per annum and the number of optimization cycles are shown. The rank sorts the 
eight systems from Tables [6TT1 and l6.2l with respect to their solution quality. 



Table 6.1 Overall results for the service policy First-In-First-Out (FIFO). Monitoring the 
net-inventory level at the current time t while limiting transshipments to adjacent locations at 
the end of the order period is the optimal policy 

Result Cvcle 

Transshipment order Pooling . _ , ,. Rank 

in % p. a. optimum (total) 



1 


current time t 


all 


-19,666,901.01 


9,562(11,562) 


3 


2 


current time t 


adjacent 


-19,758,378.05 


9,562(11,562) 


2 


3 


time of next order t \ 


all 


-19,530,460.22 


9,562(11,562) 


8 


4 


time of next order t\ 


adjacent 


-19,562,191.79 


9,562(11,562) 


7 



Table 6.2 Total results for service policy Earliest-Deadline-First (EDF). Observing the cur- 
rent net inventory level and restricting pooling dominates the other choices, while EDF is 
even slightly better than FIFO 

Result Cvcle 

Transshipment order Pooling * Rank 

in € p. a. optimum (total) 



1 


current time t 


all 


-19,652,741.48 


8,259(10,259) 


4 


2 


current time t 


next 


19,762,585.05 


8,259(10,259) 


1 


3 


time of next order t\ 


all 


-19,565,348.10 


9,562(11,562) 


6 


4 


time of next order t \ 


next 


-19,587,244.27 


9,562(11,562) 


5 



Looking at the total results, there exists a lower bound regardless of the individual 
policies. However, for both service policies solution 2 yields the best performance. 
It is advantageous for this system to order transshipments based on the current net- 
inventory level and to limit transshipments to adjacent locations at the end of an or- 
der period. But even though the total results are similar, the optimal model structure 
varies significantly. Therefore, all solutions are investigated in detail. The optimal 
parameter values of the four considered systems are listed in Tables [631 and !6.4l for 
FIFO and EDF, respectively. Pooling times f poo i,i are optimized if transshipments 
are restricted to adjacent locations, and set to the order period time t? .,■ otherwise. 
The resulting flows are visualized in Figures [6\8l46.10l for the three best solutions. 
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4,211.28 
[1.00] 




Fig. 6.8 Flows for solution 2 (rank 1) using the service policy Earliest-Deadline-First (EDF). 
Locations 2 and 4 act as hubs. The volumes ordered per period are listed next to these loca- 
tions, as well as the order frequency in square brackets. Transshipments are indicated by di- 
rected edges, in conjunction with transshipment volumes and frequencies in square-brackets 
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Fig. 6.9 Flows for solution 2 (rank 2) using the service policy First-In-First-Out (FIFO). This 
solution is similar to EDF solution 2. Locations 2 and 4 act as hubs, while locations 1, 3 and 
5 just receive transshipments, and thereby, act as spokes. Periodic order volumes are listed 
next to the hubs, as well as transshipment volumes along the edges, and frequencies in square 
brackets 



The figures illustrate that the solutions 2 for FIFO and EDF, respectively, are 
very similar. Moreover, there are two observations. First, there are locations period- 
ically receiving and offering product units. These locations act as hubs in a hub-and- 
spoke structure. Second, there are locations just receiving transshipments from other 
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606.22 
[1.001 




5,370.12 
1.001 



Fig. 6.10 Flows for (First-In-First- Out) FIFO solution 1 (rank 3). Location 1 is isolated, 
ordering products as indicated by the periodic order volume and frequency in square brackets 
next to the location, but not exchanging transshipments. The offcut network is integrated with 
location 3 as a hub. Transshipments are visualized by directed edges along with volumes per 
period and frequencies in square brackets. 



locations, and thus, never receiving periodic orders. Thereby, these locations act as 
spokes. In EDF solution 2 - the best solution - locations 2 and 4 are considered as 
hubs, while locations 1, 3 and 5 are spokes, see Figure [6781 Thus, transshipments 
take the role of periodic orders rather than just eliminating shortages due to stochas- 
tic demand. This is a consequence of the general definition of transshipments under 
continuous review, fixed order cost, and order lead times. 

Furthermore, some solutions show a specific characteristic. A particular location 
is isolated, receiving periodic orders but never exchanging transshipments, e.g., lo- 
cation 1 in FIFO solution 1, cp. Figure loTTOl That points to the limitations of the 
proposed heuristic. Ordering and offering decisions are based upon the inventory 
level, not differentiating between target locations. Therefore, in specific situations it 
may be more economical not to exchange transshipments at all. Setting up pooling 
groups is a potential way to limit the complexity and to guide the optimization pro- 
cess in this case. Of course, complete linear optimization of the transport problem 
would be feasible, too, but at the cost of continuous review. 

After studying elaborate model structures, which solution should a user imple- 
ment? Tables 16.11 and 16.21 show the individual overall cost function values of all 
solutions for FIFO and EDF, respectively. By further evaluating specific cost func- 
tions as presented in Tables 16.51 and 16.61 decisions are better informed. Low out- 
of-stock cost corresponds to high service quality, and low order and transshipment 
cost indicates efficient logistics, if the total results are comparable. FIFO solution 1 
in Table [631 leads to the least out-of-stock cost for all considered systems. A case 
in point for contradictive objectives is FIFO solution 4. Product units are constantly 
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Table 6.3 Parameter values for the service policy First-In-First-Out (FIFO) corresponding 
to the systems specified in Table 16. ll Prohibitive values, e.g., reorder points s/ never leading 
to a positive ordering decision, are enclosed in square brackets. Thus, hubs can be identified 
as locations which periodically order and offer product units. Spokes never receive periodic 
orders but replenish their stock via transshipments 





Periodic order 


Transshipment order 


Product offer 


Pooling 




Si 


Si 


1 V 


Hi 


o; 


^^min,i 


'pool,/ 


1 


457.52 


849.45 


[-22.49] 


[0.00] 


[799.91] 


[0.10] 


10.00 d 


2 


[0.00] 


[0.10] 


169.40 


269.74 


[0.00] 


[276.52] 


10.00 d 


1 3 


1,980.59 


6,305.89 


[0.00] 


[0.10] 


0.00 


447.19 


10.00 d 


4 


[0.00] 


[0.10] 


822.94 


948.94 


0.00 


616.75 


10.00 d 


5 


[0.00] 


[0.10] 


332.47 


480.29 


[0.00] 


[480.29] 


10.00 d 


1 


[0.00] 


[0.10] 


69.21 


123.65 


[0.00] 


[128.98] 


0.00 d 


2 


1,896.58 


5,075.77 


[0.00] 


[0.10] 


0.00 


169.30 


0.00 d 


2 3 


[0.00] 


[0.10] 


156.17 


219.60 


[0.00] 


[219.60] 


0.00 d 


4 


1,538.05 


2,737.03 


[-109.73] 


[0.00] 


0.00 


371.50 


0.00 d 


5 


[0.00] 


[0.10] 


173.15 


222.02 


[0.00] 


[222.02] 


0.00 d 


1 


1,944.91 


3,606.09 


[0.00] 


[0.10] 


0.00 


580.69 


10.00 d 


2 


[0.00] 


[0.10] 


172.74 


229.51 


[0.00] 


[1,111.98] 


10.00 d 


3 3 


835.21 


1,739.77 


[0.00] 


[0.10] 


[0.00] 


[1,696.18] 


10.00 d 


4 


1,247.85 


2,243.85 


[-56.56] 


[0.00] 


[0.00] 


[2,063.09] 


10.00 d 


5 


[0.00] 


[0.10] 


693.24 


693.35 


[2, 124.58] 


[36.83] 


10.00 d 


1 


[-24.46] 


[0.00] 


-123.47 


85.79 


[0.00] 


[706.60] 


0.00 d 


2 


2,218.16 


7,367.75 


936.48 


4,158.19 


252.98 


6.99 


0.00 d 


4 3 


[0.00] 


[0.10] 


71.52 


166.59 


[0.00] 


[1,359.51] 


0.00 d 


4 


[0.00] 


[0.10] 


502.27 


518.07 


[1,624.89] 


[327.09] 


5.89 d 


5 


[0.00] 


[0.10] 


3,842.22 


5,958.31 


599.10 


517.20 


0.00 d 
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Table 6.4 Parameter values for the service policy Earliest-Deadline-First (EDF) correspond- 
ing to the systems specified in Table RT2l Square brackets indicate prohibitive values, never 
leading to a positive ordering or transshipment decision. Hence, analogous to FIFO, there are 
hubs, that order periodically, and spokes, that receive transshipments but never periodically 
orders 





Periodic order 


Transshipment order 


Product offer 


Pooling 


I 


Si 


Si 


hi 


Hi 


Oi 


^^min,/ 


'pool,/ 


1 


[0.00] 


[0.10] 


1,119.30 


2,636.55 


0.00 


356.60 


10.00 d 


2 


839.26 


4,918.42 


[0.00] 


[0.10] 


943.20 


0.10 


10.00 d 


1 3 


[0.00] 


[0.10] 


513.79 


601.69 


[0.00] 


[636.77] 


10.00 d 


4 


1,276.91 


2,272.90 


[-133.33] 


[0.00] 


[1,462.34] 


[170.68] 


10.00 d 


5 


[0.00] 


[0.10] 


740.30 


872.29 


[0.00] 


[872.29] 


10.00 d 


1 


[0.00] 


[0.10] 


82.35 


126.48 


[0.00] 


[128.99] 


0.00 d 


2 


1,826.91 


5,069.91 


[0.00] 


[0.10] 


0.00 


229.32 


0.00 d 


2 3 


[0.00] 


[0.10] 


128.96 


240.01 


[0.00] 


[240.02] 


0.00 d 


4 


1,610.11 


2,738.23 


[-133.32] 


[0.00] 


0.00 


361.77 


0.00 d 


5 


[0.00] 


[0.10] 


171.41 


226.53 


[0.00] 


[226.54] 


0.00 d 


1 


457.03 


848.96 


[-66.15] 


[-66.05] 


0.00 


834.33 


10.00 d 


2 


[0.00] 


[0.10] 


345.71 


348.45 


295.33 


901.85 


10.00 d 


3 3 


1,200.00 


4,543.22 


[0.00] 


[0.10] 


461.66 


411.10 


10.00 d 


4 


1,242.79 


2,238.78 


[-219.85] 


[0.00] 


270.82 


1,251.48 


10.00 d 


5 


[0.00] 


[0.10] 


618.03 


642.67 


0.00 


1,580.89 


10.00 d 


1 


[0.00] 


[0.10] 


239.10 


239.46 


[0.00] 


[697.94] 


0.00 d 


2 


900.00 


5,112.43 


[0.00] 


[0.10] 


919.15 


145.43 


0.00 d 


4 3 


756.54 


1,658.42 


[-122.57] 


[0.00] 


[1,125.11] 


[349.98] 


0.00 d 


4 


[-26.06] 


[0.00] 


522.87 


623.86 


893.38 


485.86 


6.01 d 


5 


[-55.03] 


[0.00] 


1,015.13 


1,015.24 


0.00 


1,679.06 


0.00 d 
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Table 6.5 Cost function values for service policy First-In-First-Out (FIFO) corresponding to 
the systems specified in Table 16.11 Different performance aspects correlate with individual 
cost functions, e.g., high service quality with low out-of-stock cost, and efficient logistics 
with low order and transshipment cost 





Inventory 


Out-of-stock 


Periodic order 


Transshipment 


Gain 


i 


cost 


cost 


cost 


cost 






in € p. a. 


in € p. a. 


in € p. a. 


in € p. a. 


in € p.a. 


1 


155,087.59 


411.44 


62,299.55 


0.00 


-2,208,904.47 


2 


69,562.63 


1,255.94 


0.00 


0.00 


-3,283,135.39 


'I 


762,445.29 


1,022.29 


604,583.39 


40,015.55 


-4,400,362.71 


283,973.96 


28.74 


0.00 


2,260.93 


-5,459,167.75 


5 


136,758.42 


1,772.39 


0.00 


0.00 


-6,436,808.79 


I 


1,407,827.89 


4,490.79 


666,882.94 


42,276.48 


-21,788,379.12 


1 


29,054.96 


231.05 


0.00 


0.00 


-2,210,119.90 


2 


725,377.76 


3,427.50 


402,704.69 


46, 190.34 


-3,263,712.13 


>J 


57,920.84 


132.53 


0.00 


0.00 


-4,408,758.96 


426,709.86 


2,336.54 


240, 136.41 


3,281.27 


-5,432,982.09 


5 


61,878.88 


1,199.26 


0.00 


0.00 


-6,443,386.84 


I 


1,300,942.30 


7,326.86 


642,841.09 


49,471.61 


-21,758,959.92 


1 


195,878.78 


7,102.64 


255,016.95 


30,223.49 


-2,160,377.96 


2 


181,524.08 


5,197.14 


0.00 


0.00 


-3,248,107.96 


>l 


282,963.26 


2,473.43 


149,532.34 


0.00 


-4,385,451.48 


356,317.34 


2,970.94 


207,771.43 


0.00 


-5,426,028.47 


5 


462, 172.90 


640.30 


0.00 


0.00 


-6,450,279.37 


1 


1,478,856.36 


18,384.46 


612,320.71 


30,223.49 


-21,670,245.23 


1 


131,398.36 


3,234.10 


0.00 


0.00 


-2,189,556.93 


2 


77,948.69 


2,580.25 


560,777.79 


356,487.74 


-3,271,890.82 


«i 


213,627.07 


2,851.20 


0.00 


0.00 


-4,379,903.26 


328,564.02 


527.37 


0.00 


0.00 


-5,454,538.59 


5 


185,321.49 


1,562.78 


0.00 


309,517.80 


-6,440,700.86 


I 


936,859.63 


10,755.70 


560,777.79 


666,005.55 


-21,736,590.46 



being shipped, and thus, inventory cost is low, while transshipment cost is excessive. 
Therefore, it is reasonable to evaluate the comparative effects of all solutions in 
certain aspects, if the total results are inconclusive. To emphasize the importance of 
these aspects, the cost function coefficients are adjusted accordingly. 
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Table 6.6 Cost function values for service policy Earliest-Deadline-First (EDF) correspond- 
ing to the systems specified in Table 16.21 Out-of-stock cost monitor service quality, while 
order and transshipment cost highlight logistics efficiency 





Inventory 


Out-of-stock 


Periodic order 


Transshipment 


Gain 


i 


cost 


cost 


cost 


cost 






in € p. a. 


in € p. a. 


in € p. a. 


in € p. a. 


in € p.a. 


1 


365,747.27 


1,146.62 


0.00 


27,696.44 


-2,203,858.57 


2 


244,008.31 


2,268.36 


425,373.39 


37,028.38 


-3,274,103.38 


1 3 


165,081.55 


1,631.23 


0.00 


0.00 


-4,393,509.79 


1 4 


366,053.23 


2,325.96 


208,040.62 


0.00 


-5,433,807.75 


5 


245,453.35 


1,407.04 


0.00 


0.00 


-6,440,723.73 


X 


1,386,343.71 


8,779.21 


633,414.00 


64,724.82 


-21,746,003.22 


1 


31,999.32 


175.38 


0.00 


0.00 


-2,210,516.80 


2 


728,726.45 


1,614.28 


401,393.38 


43,385.79 


-3,279,887.25 


»! 


56,645.58 


284.35 


0.00 


0.00 


-4,407,542.41 


420,698.60 


2,864.40 


241,972.40 


3,284.45 


-5,425,756.66 


5 


62,273.66 


1,816.12 


0.00 


0.00 


-6,436,016.10 


I 


1,300,343.62 


6,754.53 


643,365.78 


46, 670.24 


-21,759,719.22 


1 


154,912.34 


419.23 


62,298.78 


0.00 


-2,208,866.14 


2 


203,470.87 


765.83 


0.00 


0.00 


-3,287,028.57 


>J 


332,156.91 


2,131.02 


440,773.80 


20, 136.40 


-4,387,147.81 


354,632.10 


3,077.18 


207,722.22 


0.00 


-5,424,566.47 


5 


391,085.60 


2,037.10 


0.00 


0.00 


-6,433,358.52 


I 


1,436,257.83 


8,430.37 


710,794.80 


20, 136.40 


-21,740,967.51 


1 


142,997.01 


314.92 


0.00 


0.00 


-2,209,536.26 


2 


209,395.76 


5,015.98 


450,905.39 


35,723.49 


-3,249,231.08 


«2 


257, 170.43 


5,751.66 


148,471.55 


0.00 


-4,350,342.94 


336,575.82 


1,066.56 


0.00 


13,377.00 


-5,448,040.82 


5 


478,784.82 


2,159.82 


0.00 


13,033.75 


-6,430,837.13 


I 


1,424,923.84 


14,308.94 


599,376.94 


62, 134.24 


-21,687,988.22 



6.7 Conclusion and Future Work 



The Simulation Optimization approach is applicable to very general multi-location 
inventory systems. The concept presented in this chapter iteratively combines a sim- 
ulator with Particle Swarm Optimization. This concept allows the investigation of 
complex models with few assumptions and is theoretically not limited to a loca- 
tion count contrary to analytical approaches. Due to the complexity of the model, it 
is difficult to understand the effect of certain policies. Therefore, valuable insights 
regarding the dynamics of the system are obtained through simulation in addition 
to the optimal parameter set. However, applying global optimization to complex 
models still involves a certain risk to end in a local optimum. That risk is confined 
by extending the simulation time and the optimization cycle count. The optimum 
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depends on the model specification and shows a specific structure. The development 
of such a structure is one of the most intriguing aspects, and the question arises, what 
conditions have a promoting effect. 

As aforementioned an advantage of the Simulation Optimization of multi-location 
inventory systems with lateral transshipments is that the model itself is straightfor- 
ward extendable. Functional extensions are, e.g., policies for periodic orders, trans- 
shipment orders and product offers. Extending the parameter set itself, the capacity 
of the locations can be optimized by introducing estate and energy cost for unused 
storage. Thus, not only the flows of transshipments are optimized, but also the al- 
location of capacities. In addition to static aspects of the model, the parameter set 
may be extended by dynamic properties such as the location-specific order period 
time. Besides these extensions there is an idea regarding orders from more than one 
location at a time. Under specific circumstances one location evolves as a supplier, 
ordering and redistributing product units. Therefore, the basic idea is to release an 
order by several locations and to solve the Traveling Salesman Problem with mini- 
mal cost. However, the existing heuristics already seem to approximate such a trans- 
portation logic well, and thus, the inclusion of more elaborate policies is expected 
just to increase complexity. Further research may also concentrate on characteristics 
favoring demand forecast and promoting certain flows through a location network 
leading to a structure. 
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Chapter 7 

Traditional and Hybrid Derivative-Free 

Optimization Approaches for Black Box 

Functions 

Genetha Anne Gray and Kathleen R. Fowler 



Abstract. Picking a suitable optimization solver for any optimization problem is 
quite challenging and has been the subject of many studies and much debate. This 
is due in part to each solver having its own inherent strengths and weaknesses. For 
example, one approach may be global but have slow local convergence properties, 
while another may have fast local convergence but is unable to globally search the 
entire feasible region. In order to take advantage of the benefits of more than one 
solver and to overcome any shortcomings, two or more methods may be combined, 
forming a hybrid. Hybrid optimization is a popular approach in the combinatorial 
optimization community, where metaheuristics (such as genetic algorithms, tabu 
search, ant colony, variable neighborhood search, etc.) are combined to improve 
robustness and blend the distinct strengths of different approaches. More recently, 
metaheuristics have been combined with deterministic methods to form hybrids that 
simultaneously perform global and local searches. In this Chapter, we will exam- 
ine the hybridization of derivative-free methods to address black box, simulation- 
based optimization problems. In these applications, the optimization is guided solely 
by function values (i.e. not by derivative information), and the function values re- 
quire the output of a computational model. Specifically, we will focus on improving 
derivative-free sampling methods through hybridization. We will review derivative- 
free optimization methods, discuss possible hybrids, describe intelligent hybrid ap- 
proaches that properly utilize both methods, and give an examples of the successful 
application of hybrid optimization to a problem from the hydrological sciences. 
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7.1 Introduction and Motivation 

Computer simulation is an important tool that is often used to reduce the costs as- 
sociated with the study of complex systems in science and engineering. In recent 
years, simulation has been paired with optimization in order to design and control 
such systems. The resulting simulation-based problems have objective functions 
and/or constraints which rely on the output from sophisticated simulation programs. 
Often, a simulator is referred to as a black box since it is defined solely by its in- 
put and output and not by the actual program being executed. In other words, the 
underlying structure of simulation is unknown. In these applications, the problem 
characteristics are mathematically challenging in the optimization landscapes may 
be disconnected, nonconvex, nonsmooth, or contain undesirable, multiple local min- 
ima. Gradient-based optimization methods are well known to perform poorly on 
problems with these characteristics as derivatives are often unavailable and approx- 



imations may be insufficient [14]. Moreover, it has been shown that derivative ap- 
proximations of functions that incorporate noisy data may contain too much error to 
be useful B56I1 . Instead, derivative-free optimization (DFO) methods, which advance 
using only function values, are applied. A variety of DFO methods have emerged 
and matured over the years to address simulation-based problems, and many are sup- 
ported theoretically with convergence criteria established. In this Chapter, we will 
review some such methods and demonstrate their utility on a water management ap- 
plication proposed specifically in the literature as a simulation-based optimization 
benchmarking problem. 

To obtain a solution to a simulation-based problem, one seeks an optimization 
algorithm that is (/) reliable in the sense that similar solutions can be obtained using 
different initial points or optimization parameters, (ii) accurate in that a reasonable 
approximation to the global minimum is obtained, and (Hi) efficient with respect to 
finding a solution using as few function calls as possible. The role of efficiency is 
particularly important in simulation-based applications because the optimization is 
guided solely by function values defined in terms of output from a black box. In 
practice, the computational time required to complete these simulations can range 
from a few seconds to a few days depending on the application and problem size. 
Thus, parallel implementations of both the simulator and the optimization methods 
are often essential for computational tractability of black-box problems. 

Picking a suitable optimization algorithm that meets these criteria is quite chal- 
lenging and has been the subject of many studies and much debate. This is be- 
cause every optimization technique has inherent strengths and weaknesses. More- 
over, some optimization algorithms contain characteristics which make them better 
suited to solve particular kinds of problems. Hybridization, or the combining of two 
or more complementary, but distinct methods, allows the user to take advantage 
of the beneficial elements of multiple methods. For example, consider two meth- 
ods A and B where method A is capable of handling noise and undefined points 
and method B excels in smooth regions with small amounts of noise. In this case, 
method A may be unacceptably slow to find a solution while method B may fail 
in noisy or discontinuous regions of the domain. By forming a hybrid, method A 
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can help overcome difficult regions of the domain and method B can be applied for 
fast convergence and efficiency. This Chapter also explores the promise of hybrid 
approaches and demonstrates some results for the water management problem. 
Throughout this Chapter, the problem of interest is 

min/(x), (7.1) 

xeQ 

where the objective function is / : IRf — * IRand Q defines the feasible search space. 
In practice, Q may be comprised of component-wise bound constraints on the de- 
cision variable x in combination with linear and nonlinear equality or inequality 
constraints. Often, Q may be further defined in terms of state variables determined 
by simulation output. The example in this Chapter includes such constraints. In ad- 
dition, integer and categorical variables (for example those which require a 'yes' 
or 'no') are present in many engineering applications. There are a variety of DFO 
methods equipped to handle these classes of problems and several are discussed 
later. For the application in this work, we consider both real-valued and mixed 
integer problem formulations. 

The rest of this Chapter is outlined as follows: In Section 2, an example of a 
black box optimization problem from hydrology is introduced. Then, in Section 
3, some DFO approaches are introduced including a genetic algorithm (GA), DI- 
RECT, asynchronous parallel pattern search (APPS) and implicit filtering. In ad- 
dition, these methods are demonstrated on the example introduced in Section 2. 
Section 4 describes some hybrid methods created using the classical DFO methods 
from Section 3, and describes their performance on the example problem. Finally, 
Section 5 summarizes all the information given in this Chapter and gives some ideas 
regarding future research directions for hybrid optimization. 



7.2 A Motivating Example 

To demonstrate the strengths and weaknesses of some DFO methods and to better 
illustrate the utility of hybrids, this Chapter will focus on the results from a water 
supply problem, notated WS in the remainder of this Chapter, which was described 



in B72LI71Q. P roblem WS has been used as a benchmarking problem for optimization 



methods 15 ll l32l l29l I5CM. 14311 . and was shown to be highly dependent on the formu- 
lation of the feasible region Q B50l 12941 . Furthermore, the use if WS in a comparison 



study of DFO methods in 1129(1 showed that (1) there are multiple, undesirable lo- 
cal minima that can trap local search methods and (2) the constraints on the state 
variables are highly sensitive to changes in the decision variables. 

The goal in the WS problem is to extract a quantity of water from a particular 
geographic region, an aquifer, while minimizing the capital cost f c to install a well 
and the operational cost f to operate a well. Thus, the optimization problem is to 
minimize the total cost of the well-field f T = f° +f° subject to bound constraints on 
the decision variables, the amount of water extracted, and the physical properties of 
the aquifer. The decision variables for this problem are the pumping rates {Qk)1 = \, 
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the well locations {(xk,yk)}k=i> an ^ me number of wells n in the final design. A 
negative pumping rate, Q k < for some k, means that a well is extracting and a pos- 
itive pumping rate, Qk> for some k, means that a well is injecting. The objective 
function and constraints of WS rely on the solution to a nonlinear partial differential 
equation to obtain values of the hydraulic head, h, which determines the direction 
of flow. Thus, h would be considered a state variable. In this example, for each well 
k = 1 . . . n in a candidate set of wells, the hydraulic head hk must be obtained via 
simulation. 



The objective function, based on the one proposed in B72U71I1 is given by 



f T = J j D k + X ci|1.5Gt|* 1 (z # »-A wta )* 2 (7-2) 

k=\ k.Q k <0.0 



X c 2Qk(h - Zgs) + X c 3Qk)dt, 



f" 



where cj and bj are cost coefficients and exponents given in [71]. In the first f c term, 
Die is the cost to drill and install well k. The second term of f c includes the cost to 
install a pump for each extraction well, and this cost is based on the extraction rate 
and h mm = 10 m, the minimum allowable hydraulic head and the ground surface 
elevation z gs - The calculation of f° is for 1 1 = 5 years. The first part of the integral 
includes the cost to lift the water to the surface which depends on the hydraulic head 
hk in well k. The second part accounts for any injection wells, which are assumed to 
operate under gravity. Details pertaining to the aquifer and groundwater flow model 
are fully described in [ 72] and are not included here as they fall outside of the scope 
of the application of optimization methods to solve the WS problem. 

Note that although the well locations {(xk,yk)}t = i d° not explicitly appear in 
Equation (17.2b . they enter through the state variable h as output from a simulation 



tool. For this work, the U.S. Geological Survey code MODFLOW |92] was used to 
calculate the head values. MODFLOW is a widely used and well supported block- 
centered finite difference code that simulates saturated groundwater flow. Since the 
well locations must lie on the finite difference grid, real-valued locations must be 
rounded to grid locations for the simulation. This results in small steps and low 
amplitude noise in the optimization landscapes. 

The constraints for the WS application are given as limitations on the pumping 
rates, 

-0.0064 m 3 /s< Q k < 0.0064 m 3 /s, k = 1,...,«, (7.3) 

and impact on the aquifer in terms of the hydraulic head, 

I0m<h k <30m,k=l,...,n. (7.4) 
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The constraints given in Equations 1 17.31 ) and (17.4b are enforced at each well. The 
total amount of water to supply is defined by the constraint 

£&t< -0.032 m 3 /s. (7.5) 

k=l 

While the pumping rates and locations are real-valued, there are options for how to 
define the variable which indicates the appropriate number of wells. One approach 
is to start with a large number of candidate wells, N w , and run multiple optimization 
scenarios where at the end of each one, wells with sufficiently low pumping rates 
are removed before the optimization routine continues. However, for realistic water 
management problems, simulations are time-consuming so it is more attractive to 
determine the number of wells as the optimization progresses. One way to do this is 
to include integer variables {zj}" =1 where each z, e {0, 1} is a binary indicator for 
assigning a well as off or on. Since this formulation requires an optimization algo- 
rithm that can handle integer variables, alternatives have been developed. In [50], 
three formulations that implicitly determine the number of wells while avoiding the 
inclusion of integer variables are compared. Two formulations are based on a multi- 
plicative penalty formulation ([69]) and one is based on removing a well during the 
course of the optimization if the rate becomes sufficiently low. This third technique 
is implemented here using an inactive-well threshold given by 

|e i |<10- 6 m 3 /s,-fe=l,...,«. (7.6) 

Note that the cost to install a well is roughly $20,000, and the operational cost is 
about $1,000 per year. Thus, using as few wells as possible drives the optimiza- 
tion regardless of the formulation. However, the inclusion of Equation (17.6b in the 
formulation results in a narrow region of decrease for an optimization method to 
find, but a large decrease in cost. Mathematically, using Equation ( 17.6b allows for 
real-valued DFO methods, but adds additional discontinuities in the minimization 
landscapes. 

The implementation of the WS problem considered in this study was taken from 
http : //www4 .ncsu. edu/~ctk/ community .html where the entire pack- 
age of simulation data files and objective function/constraint subroutines are avail- 
able for download. The final design solution is known to be five wells all operating 
at Qi = —0.0064 m 3 /s with locations aligned with the north and east boundaries, 
as shown in Table 1. See [32] for details. To study the DFO methods described in 
this Chapter, a starting point with six candidate wells was used. In order to find 
the solution, the optimization methods must determine that one well must be shut 
off while simultaneously optimizing the rates and locations of the remaining wells. 
Furthermore, the rates must lie on the boundary of the constraint in Equation( l7.3l l in 
order to satisfy the constraint given in Equation (17.5b . Thus, the WS problem con- 
tains challenging features for simulation-based optimization problems that are not 
unique to environmental engineering but that can be seen across many scientific and 
engineering disciplines. To summarize, the challenges of the WS problem include a 
black box objective function and constraints, linear and nonlinear constraints on the 
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Table 7.1 Five well solution toWS with pumping rates Q; = —0.0064 m /s,i= 1, ... ,5 



Well Number 


1 


2 


3 


4 


5 


x [m] 
5>[m] 


350 

724 


788 
797 


722 
579 


170 
800 


800 

152 



decision and state variables, multiple problem formulations, low amplitude noise, a 
discontinuous and disconnected feasible region, and multiple local minima. 



7.3 Some Traditional Derivative-Free Optimization Methods 

In this Section, we highlight some DFO approaches to solving the simulation-based 
WS problem including two global methods (the genetic algorithm (GA) and Di- 
viding RECTangles (DIRECT)), two local methods (asynchronous parallel pattern 
search (APPS) and implicit filtering), and a statistical alternative which utilizes a 
process Gaussian model. Global optimization methods seek the extreme value of a 
given function in a specified feasible region. A global solution is optimal among all 
possible solutions. In contrast, local methods identify points which are only optimal 
within a neighborhood of that point. This Section is in no way an exhaustive list 
of derivative-free methods, but instead are included to give an overview of the im- 
portance of selecting a method appropriate for the application. The derivative free 
optimization community remains active in algorithm development. Some examples 
of ongoing development include: Design Explore [7] from the Boeing Company, 
which incorporates surrogates in the search phase; NOMAD (Nonlinear Optimiza- 
tion for Fixed Variables) flf&Syfl, specifically designed to solve simulation-based 
problems, and ORBIT (Optimization by Radial Basis Function Interpolation in Trust 



Region) [77, 88] which makes use of radial basis functions. We encourage interested 
readers to refer to the citations for more information and to investigate books such 
as 11411 which give a complete overview of the topic. 

7.3.1 Genetic Algorithms (GAs) 

Genetic algorithms l36Jl53l 15411 are one of the most widely-used DFO methods and 
are part of a larger class of evolutionary algorithms called population-based, global 
search, heuristic methods 1 36]. Population based GAs are based on biological pro- 
cesses such as survival of the fittest, natural selection, inheritance, mutation, or re- 
production. Heuristic methods, such as the GA, are experience based. They contrast 
to deterministic methods which more systematically search the domain space. 

The GA codes design points as "individuals" or "chromosomes", typically as 
binary strings, in a population. It is this binary representation that makes GAs at- 
tractive for integer problems (such as WS) since the on-off representation is imme- 
diate. Through the above biological processes, the population evolves through a user 
specified number of generations towards a smaller fitness value. A simple GA can 
be defined as follows: 
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1 . Generate a random/seeded initial population of size n p 

2. Evaluate the fitness of individuals in initial population 

3. Iterate through the specified number of generations: 

a. Rank fitness of individuals 

b. Perform selection 

c. Perform crossover and mutation 

d. Evaluate fitness of newly-generated individuals 

e. Replace non-elite members of population with new individuals 

During the selection phase, better fit individuals are arranged randomly to form a 
mating pool on which further operations are performed. Crossover attempts to ex- 
change information between two design points to produce a new point that preserves 
the best features of both 'parent points,' and this is illustrated for a binary string in 
Figure 1770 Mutation is used to prevent the algorithm from terminating prematurely 
to a suboptimal point and is used as a means to explore the design space, and it is 
illustration for a binary string in Figure [7T2l (Note that both Figure [7TTl and l7.2l were 
taken from [49].) Termination of the algorithm is based on a prescribed number of 
generations or when the highest ranked individual's fitness has reached a plateau. 
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Fig. 7.1 The crossover process for a binary string ( 14911 ) 



Often, GAs are criticized for their computational complexity and dependence on 
optimization parameter settings, which are not known a priori B22LI42L 16811 . Also, 
since the GA incorporates a randomness to the search phase, multiple optimization 
runs may be needed. However, if the user is willing to exhaust a large number of 
function evaluations, a GA can help provide insight into the design space and locate 
initial points for fast, local, single search methods. The GA has many alternate forms 
and has been applied to a wide range of engineering design problems as shown in 
references such as 16011 . Moreover, hybrid GAs have been developed at all levels of 
the algorithm and with a variety of other global and local search DFO methods. See 
for example Jg, [8J, |76j] and the references therein. 
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Fig. 7.2 The mutation process for a binary string (|49]) 



In S 0, Hi, the NSGA-II implementation HH 00] of the GA was 
used on the WS problem for both the mixed-integer formulation and the inactive 
well-threshold to determine the wells. It was shown that for this problem the GA 
performed better if (1) the number of wells was determined directly by including 
the binary on-off switch compared to using the inactive well threshold and (2) if 
the initial population was seeded with points that had at least five wells operating at 
-0.0064 m 3 /s. If a random initial population was used, the algorithm could not iden- 
tify the solution after 4,000 function evaluations. If the GA was seeded accordingly, 
a solution was found within 161 function calls but the function evaluation budget 
would be exhausted before the algorithm would terminate, which for that work was 
set to 900. 



7.3.2 Deterministic Sampling Methods 



Another class of DFO methods is deterministic sampling methods [89, 67l l63l 17511 . 
In general, these methods rely upon a direct search of the decision space and are 
guided by a pattern or search algorithm. They differ from GAs in that there is no 
randomness in the method, and rigorous convergence results exist. (See [63] and 
references therein.) 



7.3.2.1 Asynchronous Parallel Pattern Search (APPS) 



Asynchronous Parallel Pattern Search (APPS) I55L 16211 is a direct search methods 
which uses a predetermined pattern of points to sample a given function domain. 
APPS is an example of a generating set search (GSS), a class of algorithms for 
bound and linearly constrained optimization that obtain conforming search direc- 



tions from generators of local tangent cones II65L 16411 . In the case that only bound 



constraints are present, GSS is identical to a pattern search. The majority of the 
computational cost of pattern search methods is the function evaluations, so parallel 
pattern search (PPS) techniques have been developed to reduce the overall computa- 
tion time. Specifically, PPS exploits the fact that once the points in the search pattern 
have been defined, the function values at these points can be computed simultane- 
ously 123L 18411 . For example, for a simple two-dimensional function, consider the 
illustrations in Figure 1731 First, the points /, g, h, and i in the stencil around point 
c are evaluated. Then, since / results in the smallest function value, the second 
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Fig. 7.3 Illustration of the steps of Parallel Pattern Search (PPS) for a simple two- 
dimensional function. On the left, an initial PPS stencil around starting point c is shown. 
In the middle, a new stencil is created after successfully finding a new local min (/). On the 
left, PPS shrinks the stencil after failing to find a new minimum 



picture shows a new stencil around point /. Finally, in third picture, since none of 
the iterates in this new stencil result in a new local minima, the step size of the 
stencil is reduced. 

The APPS algorithm is a modification of PPS that eliminates the synchronization 
requirements that the function values of all the points in the current search pattern 
must be completed before the algorithm can progress. It retains the positive features 
of PPS, but reduces processor latency and requires less total time than PPS to return 
results 1550 . Implementations of APPS have minimal requirements on the number 
of processors (i. e. 2 instead of n + 1 for PPS) and do not assume that the amount of 
time required for an objective function evaluation is constant or that the processors 
are homogeneous. 

The implementation of the APPS algorithm is more complicated than a basic GSS 
in that it requires careful bookkeeping. However, the details are irrelevant to the 
overall understanding of the method. Instead we present a basic GSS algorithm and 
direct interested readers to [40] for a detailed description and analysis of the APPS 
algorithm and corresponding APPSPACK software. The basic GSS algorithm is: 



Let xq be the starting point, Aq be the initial step size, and . 
spanning directions. 
While not converged Do 



' be the set of positive 



1 . Generate trial points Q/, = {xi + A^d/ 1 < i < \&\} where A^ G 
the maximum feasible step along dj. 

2. Evaluate trial points (possibly in parallel) 

3. If 3 x q G <2a- such that f(x q ) 

Thenx^+i = x, 
Else X£ + i 
reduction) 



[0,zUl denotes 



■f(x k ) <aAl 



*q (successful iteration) 

Xk (unsuccessful iteration) and A^ + i 



At/2 (step size 



Note that in a basic GSS, after a successful iteration (one in which a new best point 
has been found), the step size is either left unchanged or increased. In contrast, 
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when the iteration was unsuccessful, the step size is necessarily reduced. A defining 
difference between the basic GSS and APPS is that the APPS algorithm processes 
the directions independently, and each direction may have its own corresponding 
step size. Global convergence to locally optimal points is ensured using a sufficient 
decrease criteria for accepting new best points. A trial point Xk + Adi is considered 
better than the current best x k point if 

f(x k + Adi)-f(x k )<aA 2 , (7.7) 

for a > 0. Because APPS processes search direction independently, it is possible 
that the current best point is updated to a new better point before all the function 
evaluations associated with a set of trial points Q k have been completed. These re- 
sults are referred to as orphaned points as they are no longer tied to the current 
search pattern and attention must be paid to ensure that the sufficient decrease cri- 
teria is applied appropriately. The support of these orphan points is a feature of 
the APPS algorithm which makes it naturally amenable to a hybrid optimization 
structure. Iterates generated by alternative algorithms can be simply be treated as 
orphans without the loss of favorable theoretical properties or local convergence the- 
ory of APPS. It is important to note that this paradigm is in fact extensible to many 
other optimization routines and makes the APPS algorithm particularly amenable to 
hybridization in that it can readily accommodate externally generated points. 

The APPS algorithm has been implemented in an open source software pack- 



age called APPSPACK. It is written in C++ and uses MPI |47|, |48Q for parallelism. 
APPSPACK performs function evaluations through system calls to an external ex- 
ecutable which can written in any computer language. This simplifies its execution 
and makes it amenable to customization. Moreover, it should be noted that the most 
recent version of APPSPACK can handle linear constraints 164 14611 . and a software 
called HOPSPACK builds on the APPSPACK software and includes a GSS solver 



that can handle nonlinear constraints 1451 . 17411 



In 12911 . APPS was applied to WS problem using the constraint in Equation 
( 17.6b . Like the GA, APPS was sensitive to the initialization of the optimiza- 
tion and required a starting point as described above otherwise the algorithm 
would converge to a suboptimal six well design. However, given good initial data 
the algorithm converged to a comparable solution within 200 function evalua- 
tions. APPS has also been successfully applied to problems in microfluidics, bi- 
ology, thermal design, and forging. (See [40] and references therein and the URL 
https : //software . sandia.gov/appspack/ for user success stories and 
detailed examples.) 

7.3.2.2 Implicit Filtering 

Implicit filtering is based on the notion that if derivatives were available and reli- 
able, Newton-like methods would yield fast results. The method evaluates a stencil 
of points at each iteration used simultaneously to form finite difference gradients and 
as a pattern for direct search 1 35] . Then, a secant approach, called a quasi-Newton 
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method, is used to solve the resulting system of nonlinear equations at each itera- 
tion 1 61] to avoid finite difference approximations of the Hessian matrix. In contrast 
to classical finite-difference based Newton algorithms, implicit filtering begins with 
a much larger stencil to account for noise. This step-size is reduced as the opti- 
mization progresses to improve the accuracy of the gradient approximation and take 
advantage of the fast convergence of quasi-Newton methods near a local minimum. 

To solve the WS problem, a FORTRAN implementation called IFFCO (Implicit 
Filtering For Constrained Optimization) was used. IFFCO depends on a symmetric 
rank one quasi-Newton update [11]. The user must supply an objective function and 
initial iterate and then optimization is terminated based on a function evaluation 
budget or by exhausting the number of times the finite difference stencil is reduced. 
We denote the finite difference gradient with increment size p by V p f and the model 
Hessian matrix as H. For each p, the projected quasi-Newton iteration proceeds 
until the center of the finite difference stencil yields the smallest function value or 
||V /7 /|| < tp, which means the gradient has been reduced as much as possible on 
the current scale. After this, the difference increment is halved (unless the user has 
specified a particular sequence of increments) and the optimization proceeds until 
the function evaluation budget is met. 

The general unconstrained algorithm can be outlined as follows: 

While not converged 

Do until V p f < tp 

1 . Compute Vpf 

2. Find the least integer A such that sufficient decrease holds 

3. x = x-XH- x Vpf(x) 

4. Update H via a quasi-Newton method 
Reduce p 

This can be illustrated on an a small perturbation of a quadratic function as illus- 
trated in Figure 1741 Given an initial iterate xrj = —1.25 and p = 0.25, the resulting 
centered finite difference stencil is shown on the left. The center of the stencil, f(xo) 
is denoted with an "*" and /(jco ±p) is denoted with "o". Since the center of the 
stencil has the lowest function value, the algorithm would proceed and take a decent 
step. Then, suppose the next iterate is as in the center picture of Figure 17.41 Then, 
the lowest function value occurs on the stencil at f(x\ ) and thus stencil failure has 
occurred. In this case, p, is reduced by half. Then, the stencil would be as in the 
right picture, and stencil failure would not occur, so the algorithm would proceed. 



IFFCO was used to solve the WS problem in 11321 12941 using the inactive-well 
threshold and also in [50] a multiplicative penalty term to determine the number of 
wells. The behavior for both implementations was similar to that of APPS in that a 
good initial iterate was needed to identify the five well solution. With good initial 
data, IFFCO identified the solution within 200 function evaluations. 

IFFCO and the implicit filtering algorithm in general have been successfully ap- 
plied to a variety of other challenging simulation-based optimization applications 
including mechanical engineering [12], polymer processing [31], and physiolo gica l 
modeling [30]. There are several convergence theorems for implicit filtering II 1 311 - 
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Fig. 7.4 The illustration on the right shows the first implicit filtering stencil for a small 
perturbation on a quadratic function. The center picture shows stencil failure and the picture 
on the right illustrates a new stencil with a reduced step size 



which was particularly designed for the optimization of noisy functions with bound 
constraints 13511 . 

Linear and nonlinear constraints may be incorporated into the objective function 
via a penalty or barrier approach. The default in IFFCO is to handle constraint viola- 
tions using an extreme barrier approach and simply assign a large function value to 
any infeasible point. The performance of IFFCO on nonsmooth, nonconvex, noisy 
problems and even those with disconnected feasible regions are strong but the de- 
pendence on the initial starting point is well documented (8j,|6l|]. Also, note that 
IFFCO includes a projection operator to handle bound constraints. 



7.3.2.3 DIRECT 



DIRECT, an acronym for Dividing RECTangles, was designed for global optimiza- 
tion of bound constrained problems as an extension of Shuberts Lipschitz optimiza- 
tion method [58]. Since its introduction in the early 1990's, a significant number of 
papers have been written analyzing, describing, and developing new variations of 
this highly effective algorithm. Some of these include SHE Bill, [MS El- 
DIRECT is essentially a partitioning algorithm that sequentially refines the re- 
gion defined by bound constraints at each iteration by selecting a hyper-rectangle to 
trisect J27[|33Ll58n . To balance the global and local search, at each iteration a set S of 
potentially optimal rectangles is identified based on the function value at the center 
of the rectangle and the size of the rectangle. The basic algorithm is as follows: 

1. Normalize the bound constraints to form a unit hypercube search space with 
center c\ 

2. Find /(ci ) , set f min =f{c x ),i = 



I nun 
1 
3 



3. Evaluate /(c,- + ^e,), 1 < i < n where e,- is the / unit vector 

4. While not converged Do 



a. Identify the set S of all potentially optimal rectangles 

b. For all j <E S, identify the longest sides of rectangle j, evaluate / at centers, 
and trisect j into smaller rectangles 

c. Update /„,,■„,/ = /+! 
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Note that DIRECT requires that both the upper and lower bounds be finite. The al- 
gorithm begins by mapping the rectangular feasible region onto the unit hypercube; 
that is DIRECT optimizes the transformed problem 



min f(x)=f(Sx + £) 

xGK" 



subject to < x < e, 



(7.8) 



where x = S (x— £) with 5 = diag(t/i 



In). Figure l731 illustrates three 



iterations of DIRECT for a two dimensional example. At each iteration, a candidate, 
potentially optimal, hyper-rectangle is selected and refined. Though other stopping 
criteria exist, this process typically continues until a user defined budget of function 
evaluations has been expended. 



* 

• ■ * 

■ 
■ 



Fig. 7.5 For a two-dimensional problem, DIRECT iteratively subdivides the optimal hyper- 
rectanele into thirds 



The criteria for being a potentially optimal hyper- rectangle given a constant e > 
is as follows [58]: Suppose there are K enumerated hyper-rectangles subdividing 
the unit hypercube from Equation ( 17.8b with centers c,, 1 < i < K. Let ji denote 
the corresponding distance from the center c, to its vertices. A hyper-rectangle £ is 
considered potentially optimal if there exists a,K > such that 



f(ct) - a K yt < f(ci) - a K Yi, \<i<K 
f{ct) - a K yi < /min - e I /min I- 



(7.9) 
(7.10) 



The set of potentially optimal hyper-rectangles forms a convex hull for the set point 
{f(ci),Yi}. Figure l731 illustrates this. Notice that the user defined parameter e con- 
trols whether or not the algorithm performs more of a global or local search. 

Although DIRECT has been shown to be highly effective for relatively small 
prob lems and has proven global convergence, it does suffer at higher dimensions 
lla[87|,[80l|28|] and requires an exhaustive number of function evaluations. In 12911 . 
DIRECT was unable to identify a five well solution to the WS problem when start- 
ing with an initial six well configuration and using the constraint in Equation ( 17.6b . 
These results are not surprising given that the five well solution has all of the pump- 
ing rates lying on the bound constraint. The sampling strategy of DIRECT does not 
make it a good candidate for this problem. 
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• Nonoptimal 

O Potentially optimal 



Rectangle size 



Fig. 7.6 Potentially optimal hyper-rectangles can be found by forming the convex hull of 
the set {f(ci),Yi}, where c,- denotes the center point of the ith hyper-rectangle and J, the 
corresponding distance to hyper-rectangle's vertices 



7.3.3 Statistical Emulation 



An alternative approach to optimization is statistical emulation, wherein the previ- 
ous runs of the computer code are used to train a statistical model, and the model 
is used to draw inferences about the location of the optimum. The idea of using a 
stochastic process to approximate an unknown function dates back as far as Poincare 
in the 19th century [24}]. In particular, a Gaussian Process (GP) is typically used for 
the emulation of computer simulators 117 8[ 17911 . The output of the simulator is treated 
as a random variable Y(x) that depends on the input vector x such that the response 
varies smoothly. This smoothness is given by the covariance structure of the GP. 
The mean and covariance functions determine the characteristics of the process, as 
any finite set of locations has a joint multivariate Gaussian distribution J181I81I1 . A 
Bayesian approach allows full estimation of uncertainty, which is useful when try- 
ing to determine the probability that an unsampled location will be an improvement 
over the current known optimum. 

Specifically, the uncertainty about future computer evaluations can be quanti- 
fied by finding the predictive distribution for new input locations conditional on the 
points that have already been evaluated. Since this is now a full probabilistic model 
for code output at new locations, any statistic depending upon this output is easily 
obtained. The expected improvement at a point x, E[min (/„„■„ — fix)-, 0)] , is a useful 
criterion for choosing new locations for evaluation. The paper by [59] illustrates the 
use of this statistic in optimization. Since the improvement is a random variable, this 
criterion balances rewarding points where the output is highly uncertain, as well as 
where the function is generally predicted to be better than the present best point. A 
number of candidate locations are generated from an optimal space filling design. 
Then, a GP model is fit to the existing output, and the expected improvement is 
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calculated at each candidate location. The points with highest expected improve- 
ment are selected as candidates for the new best point. 

Standard GP models have several drawbacks, including strong assumptions of 
stationarity and poor computational scaling. To reduce these problems, treed Gaus- 
sian process (TGP) models partition the input space using a recursive tree structure; 



and independent GP models are fit within each partition B371 13811 . Such models are 
a natural extension of standard GP models, and combine partitioning ideas with 
Bayesian methods to produce smooth fitted functions [9]. The partitions can be fit 
simultaneously with the parameters of the embedded GP models using reversible 
jump Markov chain Monte Carlo 14111 . 

Note that the statistical emulation via TGP has the disadvantage of computa- 
tional expense. As additional points are evaluated, the computational work load of 
creating the GP model increases significantly. This coupled with some convergence 
issues when TGP approaches the solution indicate that TGP alone is not an effec- 
tive method for solving the WS problem. However, TGP is an excellent method for 
inclusion in a hybrid because these disadvantages can be overcome. 



7.4 Some DFO Hybrids 

In order to both take advantage of the benefits of more than one optimization ap- 
proach and to try to overcome method-specific shortcomings, two or more optimiza- 
tion methods may be combined, forming a hybrid. Hybrid optimization is a popular 
approach in the combinatorial optimization community, where metaheuristics (such 
as GAs), are combined to improve robustness and blend the distinct strengths of 
different approaches [83]. More recently, metaheuristics have been combined with 
deterministic methods (such as pattern search) to form hybrids that simultaneously 
perform global and local searches II73L 19 lL IS?Ct I86L 12611 . 



The use of hybrids in the combinatorial optimization community has grown to 
include a categorization scheme for hybrids [76] which includes four main charac- 
teristics: 1) class of algorithms used to form hybrids, 2) level of hybridization, 3) 
order of execution, and 4) control strategy. Choosing algorithms to hybridize is a 
significant challenge in forming hybrids. Methods that have complementary advan- 
tages and are well suited to the problem of interest should be selected. Hybridization 
levels include loosely or tightly coupled. In general, loosely coupled approaches re- 
tain the individual identities of the methods being hybridized. In contrast, tightly 
coupled hybrids exhibit a strong relationship between the individual pieces and may 
share components or functions. Loosely coupled hybrids are advantageous from 
both a software development and theoretical perspective. They do not require the 
re-implementation of existing methods and also keep theoretical convergence prop- 
erties of the individual methods intact. The order of execution of hybrid algorithms 
can either be sequential or parallel. Sequentially hybrid methods string together a 
set of algorithms head to tail, using the results of a completed run of one algorithm 
to seed the next. From this perspective it is often unclear whether or not the previ- 
ously executed algorithms should be viewed simply as a preprocessing step, or if 
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ensuing algorithm runs should be viewed as post-processing. On the other hand, 
parallel hybrids execute the individual methods simultaneously and can thus be 
made to be collaborative and share information dynamically to improve perfor- 
mance IlL llal . The control strategy of hybrid algorithms can be either integrative 
or collaborative. In a purely integrative approach, one individual algorithm is subor- 
dinate to or an embedded component of another. Collaborative methods give equal 
importance and control to both algorithms as algorithms merely exchange informa- 
tion instead of being an integral part of one another. 

Note that the effectiveness of a hybrid approach to optimization may be compro- 
mised if the methods combined are not suited to one other or to the application of 
interest. In this section, we discuss hybrid optimization in the context of the water 
resources management problem WS. Four hybrid methods are described here, only 
two of which are successful for the WS problem. We include the two unsuccessful 
hybrids to demonstrate the efficacy of tailoring hybrids to address the characteristics 
of the problem being solved. To test this design, the hybrids were applied to the WS 
problem without a starting point. 

7.4.1 APPS-TGP 

Some optimization methods have introduced an oracle to predict additional points at 
which a decrease in the objective function might be observed. Analytically, an oracle 
is free to choose points by any finite process. (See [63] and references therein.) 
The addition of an oracle is particularly amenable to a pattern search methods like 
APPS. The iterate(s) suggested by the oracle are merely additions to the pattern. 
Furthermore, the asynchronous nature of the APPSPACK implementation makes it 
adept at handling the evaluation of the additional points. The idea of an oracle is 
used as a basis for creating a hybrid optimization scheme which combines APPS 
and the statistical emulator TGP 

In the APPS-TGP hybrid, the TGP statistical model serves as the an oracle. The 
hopes in utilizing the TGP oracle include added robustness and the introduction of 
some global properties to APPSPACK. When the oracle is called, the TGP algorithm 
is applied to the set of evaluated iterates in order to choose additional candidate 
points. In other words, APPSPACK is still optimizing as normal, but throughout 
the optimization process, the iterate pairs (x' ,f(x')) are collected. Then, the TGP 
model is fit to the existing output, and the expected improvement is calculated at 
each candidate location. The points with highest expected improvement are passed 
back to the APPS algorithm to determine if it is a new best point. If not, the point is 
merely discarded and the APPS algorithm continues without any changes. However, 
if a TGP point is a new best point, the APPSPACK search pattern continues from 
that new location. The general flow of this algorithm is illustrated in Figure 17.71 
Note that both APPS and TGP generate points and both methods are informed of 
the function values associated with these iterates. 

This hybrid technique is loosely coupled as APPS and TGP run independently of 
each other. Since the iterates suggested by the TGP algorithm are used in addition 
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Fig. 7.7 The flow of the APPS-TGP hybrid. Both APPS and TGP generate iterates. The 
iterates are merged into one list. Then, the function value of each iterate is either obtained 
from cache or evaluated. Finally, the results are shared with both methods 



to the iterates suggested by APPSPACK, there is no adverse affect on the local con- 
vergence properties of APPS. As noted earlier, pattern search methods have strong 
local convergence properties [25, 66, 85]. However, their weakness is that they are 
local methods. In contrast, TGP performs a global search of the feasible region, 
but does not have strong local convergence properties. Hence, using the hybridiza- 
tion scheme, TGP lends a globalization to the pattern search and the pattern search 
further refines TGP iterates by local search. This benefit is clearly illustrated on a 
model calibration problem from electrical engineering in [ 82] and on a groundwater 
remediation problem in [39]. 

APPS-TGP is also collaborative since APPS and TGP are basically run indepen- 
dently of one another. From the perspective of TGP, a growing cache of function 
evaluations is being cultivated, and the sole task of TGP is to build a model and se- 
lect a new set of promising points to be evaluated. The TGP algorithm is not depen- 
dent on where this cache of points comes from. Thus in this approach, we may eas- 
ily incorporate other optimization strategies where each strategy is simply viewed 
as an external point generating mechanism leveraged by TGP. From the perspective 
of APPS, points suggested by TGP are interpreted in an identical fashion to other 
trial points and are ignored unless deemed better than the current best point held 
by APPS. Thus neither algorithm is aware that a concurrent algorithm is running in 
parallel. However, the hybridization is integrative in the sense that points submitted 
by TGP are given a higher priority in the queue of iterates to be evaluated. In the 
parallel execution of APPS and TGP, TGP is given one processor (because it is com- 
putationally prohibitive) while APPS directs the use of the remaining processors to 
perform point evaluations. Communication between TGP and APPSPACK occurs 
intermittently through out the optimization process, whenever TGP completes and 
is ready to look at a new cache of points. 
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The APPS-TGP method was specifically designed to address the disadvantages 
associated with using a local method like APPS. Specifically, APPS usually requires 
a "good" starting point to find an optimal solution and not become trapped in a local 
minimum. This was demonstrated for the WS problem in [29] where APPS failed to 
find the optimal 5-well solution for 100 alternative starting points. The addition of 
TGP provides a global scope to help overcome the inherently local characteristics 
of APPS. When APPS-TGP was applied to the WS problem, an optimal 5-well 
solution was obtained with less than 500 function evaluations. 

7.4.2 EAGLS 

To address mixed-variable, nonlinear optimization problems (MINLPs) of the form 

minimize x e K"" ,z E %" b f(z,x) 

subject to c{z,x) < n M\ 

ze<z<z u 

X( < X < x u . 

where c(x) : IR" — ► IR™, consider a hybrid of a GA and a direct search. The APPS-GA 
hybrid, commonly referred to as EAGLS (Evolutionary Algorithm Guiding Local 
Search), uses the GA's handling of integer and real variables for global search, and 
APPS's handling of real variables in parallel for local search [43]. 

As previously discussed, a GA carries forward a population of points that are it- 
eratively mutated, merged, selected, or dismissed. However, individuals in the pop- 
ulation are not given the opportunity to make intergeneration improvements. This 
is not reflective of he real world, where an organism is not constant throughout its 
life span, but instead can grow, improve, or become stronger. Improvements within 
a generation are allowed in EAGLS. The GA still governs point survival, mutation, 
and merging as an outer iteration, but, during an inner iteration, individual points 
are improved via APPS applied to the real variables, with the integer variables held 
fixed. For simplicity, consider the synchronous EAGLS algorithm: 

1 . Evaluate initial population 

2. While not converged Do 

a. Perform selection, mutation, crossover 

b. Evaluate new points 

c. Choose points for local search 

d. Make multiple calls to APPS for real-valued subproblems 

Of course, to allow the entire population to grow as such, would be computationally 
prohibitive. Thus, EAGLS employs a ranking algorithm that takes in to account in- 
dividual proximity to other, better points. The goal of this step is to select promising 
individuals representing distinct subdomains. Note that the flow of the asynchronous 
EAGLS algorithm is slightly different than that of APPS-TGP. In this case, NSGA- 
II generates iterates and multiple instances of APPS also generate iterates. Returned 
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Fig. 7.8 The flow of EAGLS. The NSGA-II algorithm generate iterates. Then, some iterates 
are selected for refinement by multiple instances of APPS. The iterates are merged into one 
list, and the function value of each iterate is either obtained from cache or evaluated. Finally, 
the results are returned so that the APPS instances and the GA can proceed 



function values are distributed to the appropriate instance of APPS or the GA. This 
is illustrated in Figure [7781 

Note that it is the combinatorial nature of integer variables that makes the solution 
of MINLPs difficult. If the integer variables are relaxable (i.e. the objective function 
is defined for rational variables), more sophisticated schemes such as branch and 
bound may be preferred options. However, for simulation-based optimization prob- 
lems, the integer variables often represent a descriptive category (i.e. a color or a 
building material) and may lack the natural ordering required for relaxation. That 
is, there is may be no well-defined mathematical definition for what is meant by 
"nearby." In the WS problem, the number of wells is not a relaxable variable be- 
cause, for example, one-half a well cannot be installed. The other results for the 
WS problem given in this chapter consider the strictly real-valued WS formula- 
tion. However, since EAGLS was designed to handle MINLPs, it was applied to the 
MINLP formulation of WS. Moreover, EAGLS combines a global and local search 
in order to take advantage of the global properties and overcome the computational 
expense of the GA. 

To illustrate the global properties of EAGLS, the problem was solved without 
an initial point. In B43I1 . EAGLS was able to locate a five well solution using only 
random points in the initial GA population. Moreover, this was done in after about 
65 function evaluations. This is an improvement both for the GA and for the local 
search method APPS. The function evaluation budget was 3000 and roughly 1000 
of those were spent on points that did not satisfy the linear constraint in Equation 
( 17.5b which means the simulator was never called. 
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7.4.3 DIRECT-IFFCO 

A simple sequential hybrid was proposed in [8] where the global search strengths 
of DIRECT were used to generate feasible starting points for IFFCO. This hybrid 
further addresses the weakness that DIRECT may require a large number of func- 
tion evaluations to find a highly accurate solution. In that work, DIRECT and IF- 
FCO, which was initialized using random points, were compared to the sequen- 
tial pairing for a gas pipeline design application and a set of global test problems. 
The pipeline problems were significantly challenging since the underlying objective 
function would often fail to return a value. This is referred to as a hidden constraint 
in simulation-based or black-box optimization. DIRECT showed some evidence of 
robustness in terms of locating global minima but often required an excessive num- 
ber of function evaluations. IFFCO alone showed mixed performance; sometimes 
refining the best solution once a feasible point was located but often converging to 
a suboptimal local minimum. For the hybrid, the results were promising. Even if 
the function value at the end of the DIRECT iteration was high, IFFCO was able 
to avoid entrapment in a local minima using the results. In fact, using DIRECT as 
a generator of starting points for local searches has been actively studied over the 
years and applied to a variety of applications. For example, in [17] DIRECT was 
paired with a sequential quadratic programming method for the local search and 
outperformed a variety of other global methods applied to an aircraft design prob- 
lem and in 170(1 a gradient-based local search was shown to accelerate convergence 
to the global minimum for a flight control problem. 

A different idea was used in J2911 where DIRECT was used in conjunction with 
IFFCO to find starting points for the WS problem. In this case, DIRECT was used to 
minimize an aggregate of constraint violation and thereby identify sets of feasible 
starting points and then IFFCO was used to minimize the true objective function. 
This approach was not successful in that the points identified by DIRECT were so 
close to multiple local minima that IFFCO was unable to improve the objective func- 
tion value. In particular, IFFCO would only converge if initial points contained five 
wells operating on the bound constraint for their pumping rates, and DIRECT did 
not identify any points of this sort. The advantages obtained by combining DIRECT 
and IFFCO do not address the characteristics of the WS problem that make it diffi- 
cult to solve. However, it should be noted that the idea of using DIRECT and IFFCO 
together in this sort of bi-objective approach certainly warrants further investigation 
despite the performance on the WS problem. 

7.4.4 DIRECT-TGP 

Another attempt to improve the local search of DIRECT involves TGP with a 



gradient-based method on the surrogate model, which is cheap to minimize [52]. 
Hybridization in this case is performed at the iteration level in that the center of the 
current rectangle is used as a starting point for a local search on the surrogate. Es- 
sentially, the procedure for dividing hyper-rectangles in Step 4(b) in Section 2.2.3 
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above is replaced with the following steps once the number of function evaluations 
is larger than 2n + l, which allows for the initial hypercube sampling: 

1 . Build TGP surrogate using all known function evaluations 

2. Start local search on the surrogate, constrained to the rectangle, using the center 
of the rectangle as the initial point 

3. Evaluate / the local optimum, xi oc 

4. Return /(x/ oc ) instead of /(c;) 

The algorithm, although relatively new, has been tested on a suite of bound con- 
strained and nonlinearly constrained problems and a cardiovascular modeling 
problem proposed in [30]. 

These promising preliminary results indicate this new hybrid can improve the 
local search capabilities of DIRECT. This is achieved without compromising the 
computational efficiency and with practically no additional algorithmic parameters 
to fine-tune. It should also be noted that other hybrids that attempt to improve the lo- 
cal search of DIRECT have been proposed. For example, [44] proposes a DIRECT- 
GSS hybrid. The resulting algorithm does show some promising results in terms 
of reducing the computational workload required to solve the optimization prob- 
lem, but it has only been investigated on test problems from the literature. Further 
tests are needed to determine its applicability to engineering applications. Given the 
performance of the DIRECT-IFFCO approach above, any local search hybrid with 
DIRECT would likely not perform well on the WS problem. 



7.5 Summary and Conclusion 

The purpose of this Chapter is to introduce a number of derivative-free optimiza- 
tion techniques available to address simulation-based problems in engineering. In 
this Chapter, we have shown their effectiveness for a problem from hydrological 
engineering. The purpose of this exercise was not to compare methods, but instead 
to show the wide variety of options available. In addition, we showed that some 
techniques do not address the characteristics of some problems. In Table 1, we sum- 
marize these results. 

Finally, we note that another utility of hybrid optimization methods could be to 
inform the decision making processes. Currently, optimization algorithms accept 
guesses (or iterates) based solely on some notion of (sufficient) increase or decrease 
of the objective function. In order to serve as decision makers, next-generation al- 
gorithms must also consider rankings and probability metrics. For example, com- 
putational costs can be assessed so that iterates are only evaluated if they meet a 
set computational budget. This is particularly important for the expensive objective 
functions of simulation-based optimization problems. Moreover, hybridization opti- 
mization algorithms can incorporate tools that will allow the user to dismiss subsets 
of the domain that exhibit large variances or that exceed critical thresholds. In addi- 
tion, hybrid algorithmic frameworks are a step towards finding methods capable of 
generating a "robust" set of optimal solution options instead of a single optimum. 
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Table 7.2 Summary of the Performance of some Derivative-Free Methods for the WS 
Problem 



Method 


Found 5-Well Number of Starting Pt 




Solution 


Fn Evals 


Required 


APPS 


Y 


176 


Y 


IFFCO 


Y 


<200 


Y 


DIRECT 


N 


- 


Y 


GA 


Y 


161 


Y 


APPS-TGP 


Y 


492 


N 


EAGLS 


Y 


65 


N 



The current state of the art is to accept an iterate as an optimum based on the inabil- 
ity to find better guess within a decreasing search region. This may lead to solutions 
to design problems that are undesirable due to a lack of robustness to small design 
perturbations. Instead, algorithms that allow designers to choose a solution based 
on additional criteria can be created in the hybrid framework. For example, a re- 
gional optimum could be used to generate a set of multiple solutions from which 
the designer can choose. 
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Chapter 8 

Simulation-Driven Design in Microwave 

Engineering: Methods 

Slawomir Koziel and Stanislav Ogurtsov 



Abstract. Today, electromagnetic (EM) simulation is inherent in analysis and de- 
sign of microwave components. Available simulation packages allow engineers to 
obtain accurate responses of microwave structures. In the same time the task of 
microwave component design can be formulated and solved as an optimization 
problem where the objective function is supplied by an EM solver. Unfortunately, 
accurate simulations may be computationally expensive; therefore, optimization 
approaches with the EM solver directly employed in the optimization loop may be 
very time consuming or even impractical. On the other hand, computationally ef- 
ficient microwave designs can be realized using surrogate-based optimization. In 
this chapter, simulation-driven design methods for microwave engineering are de- 
scribed where optimization of the original model is replaced by iterative re- 
optimization of its surrogate, a computationally cheap low-fidelity model which, 
in the same time, should have reliable prediction capabilities. These optimization 
methods include space mapping, simulation-based tuning, variable-fidelity opti- 
mization, and various response correction techniques. 

Keywords: computer-aided design (CAD), microwave design, simulation-driven 
optimization, electromagnetic (EM) simulation, surrogate-based optimization, 
space mapping, surrogate model, high-fidelity model, low-fidelity model. 



8.1 Introduction 

Computer-aided full-wave electromagnetic analysis has been used in microwave 
engineering for a few decades. Initially, its main application area was design 
verification. Electromagnetic (EM) simulations can be highly accurate but, at the 
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same time, they are computationally expensive. Automated EM-simulation-driven 
optimization was not possible until 1980s when faster CPUs as well as robust 
algorithms became available [1]. During the 1980s, commercial EM simulation 
software packages, e.g., those developed by Ansoft Corporation, Hewlett-Packard, 
and Sonnet Software, started appearing on the market. Formal EM-based optimi- 
zation of microwave structures has been reported since 1994 [2] -[4]. 

In many situations, theoretical models of the microwave structures can only be 
used to yield the initial designs that need to be further tuned to meet performance 
requirements. Today, EM-simulation-driven optimization and design closure be- 
come increasingly important due to complexity of microwave structures and in- 
creasing demands for accuracy. Also, EM-based design is a must for a growing 
number of microwave devices such as ultrawideband (UWB) antennas [5] and 
substrate-integrated circuits [6]. For circuits like these, no design-ready theoretical 
models are available, so that design improvement can be only obtained through 
geometry adjustments based on repetitive simulations. 

In this chapter, some major challenges of EM-simulation-driven microwave de- 
sign are discussed, and traditional approaches that have been used over the years 
are reviewed. Certain microwave-engineering-specific approaches that aim at re- 
ducing the computational cost of the design process (in particular, system decom- 
position and co-simulation) are mentioned. We also characterize optimization 
techniques available in commercial EM simulation packages. 

The main focus of this chapter is on surrogate-based approaches that allow 
computationally efficient optimization. Fundamentals of surrogate-based micro- 
wave design as well as popular strategies for building surrogate models are dis- 
cussed. Special emphasis is put on surrogates exploiting physics-based low- 
fidelity models. 

8.2 Direct Approaches 

Microwave design task can be formulated as a nonlinear minimization problem 

x" eargmmUiRJx)) C 8 - 1 ) 

where RfE R m denotes the response vector of the device of interest, e.g., the 
modulus of the transmission coefficient LS71I evaluated at m different frequencies. U 
is a given scalar merit function, e.g., a minimax function with upper and lower 
specifications [7]. Vector* is the optimal design to be determined. Normally, R f is 
obtained through computationally expensive electromagnetic simulation. It is re- 
ferred to as the high-fidelity or fine model. 

The conventional way of handling the design problem (8.1) is to employ the EM 
simulator directly within the optimization loop as illustrated in Fig. 8.1. This direct 
approach faces some fundamental difficulties. The most important one is the compu- 
tational cost. EM simulation of a microwave device at a single design can 
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take — depending on the system complexity — as long as several minutes, several 
hours or even a few days. On the other hand, the typical (e.g., gradient-based) opti- 
mization algorithms may require dozens or hundreds of EM simulations, which 
makes the optimization impractical. Another difficulty is that the responses obtained 
through EM simulation typically have poor analytical properties. In particular, they 
contain a lot of numerical noise: discretization topology of the simulated structure 
may change abruptly even for small changes of the design variables which is caused 
by adaptive meshing techniques utilized by most modern EM solvers. This, in turn, 
results in the discontinuity of the response function Rj(x). Additional problem for di- 
rect EM-based optimization is that the sensitivity information may not be available 
or expensive to compute. Only recently, computationally cheap adjoint sensitivities 
[8] started to become available in some major commercial EM simulation packages, 
although for frequency-domain solvers only [9], [10]. 

It has to be emphasized that probably the most common approach to simula- 
tion-driven design used by engineers and designers these days is repetitive 
parameters sweeps. Typically, a number of EM analyses are carried out while 
varying a single design variable. Such a process is then repeated for other vari- 
ables. The information obtained through such parameter sweeps is combined with 
engineering experience in order to yield a refined design that satisfies the pre- 
scribed specifications. This process is quite tedious and time consuming and, of 
course, requires a substantial designer interaction. 
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Fig. 8.1 Conventional simulation-driven design optimization: the EM solver is directly em- 
ployed in the optimization loop. Each modification of the design requires additional simula- 
tion of the structure under consideration. Typical (e.g., gradient-based) optimization algo- 
rithms may require tens or hundreds of computationally expensive iterations. 



In terms of automated EM-based design, conventional techniques are still in 
use including gradient-based methods (e.g., quasi-Newton techniques [11]), as 
well as derivative free approaches such as Nelder-Mead algorithm [12]. In some 
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areas, particularly for antenna design, population-based search techniques are used 
such as genetic algorithms [13], [14] or particle swarm optimizers [15], [16]. 
These algorithms are mostly exploited to handle issues such as multiple local op- 
tima for antenna-related design problems, although they suffer from substantial 
computational overhead. Probably the best picture of the state of the art in the 
automated EM-simulation-based design optimization is given by the methods that 
are available in major commercial software packages such as CST Microwave 
Studio [9], Sonnet Software [17], HFSS [10], or FEKO [18]. All these packages 
offer traditional techniques including gradient-based algorithm, simplex search, or 
genetic algorithms. Practical use of these methods is quite limited. 

One of possible ways of alleviating the difficulties of EM-simulation-based de- 
sign optimization is the use of adjoint sensitivities. The adjoint sensitivity ap- 
proach dates back to the 1960s work of Director and Rohrer [8]. Bandler et al. 
[19] also addressed adjoint circuit sensitivities, e.g., in the context of microwave 
design. Interest in EM-based adjoint calculations was revived after the work [20] 
was published. Since 2000, a number of interesting publications addressed the ap- 
plication of the so-called adjoint variable method (AVM) to different numerical 
EM solvers. These include the time-domain transmission-line modeling (TLM) 
method [21], the finite-difference time-domain (FDTD) method [22], the finite- 
element method (FEM) [23], the method of moments (MoM) [24], the frequency 
domain TLM [25], and the mode-matching method (MM) [26]. These approaches 
can be classified as either time-domain adjoint variable methods or frequency- 
domain adjoint variable methods. Adjoint sensitivity is an efficient way to speed 
up (and, in most cases, actually make feasible) gradient-based optimization using 
EM solvers, as the derivative information can be obtained with no extra EM simu- 
lation of the structure in question. As mentioned before, adjoint sensitivities are 
currently implemented in some major commercial EM simulation packages, particu- 
larly in CST Microwave Studio [9], and in HFSS [10]. As for now, adjoint sensitiv- 
ity is only available for frequency-domain solvers; however, CST plans to imple- 
ment it in time-domain in one of the next releases. 

Another way of improving efficiency of simulation-driven design is circuit de- 
composition, i.e., breaking down an EM model into smaller parts and combining 
them in a circuit simulator to reduce the CPU-intensity of the design process [27]- 
[29]. Co-simulation or co-optimization of EM/circuit is a common industry solu- 
tion to blend EM-simulated components into circuit models. In general through, 
this is only a partial solution though because the EM-embedded co-simulation 
model is still subject to direct optimization. 



8.3 Surrogate-Based Design Optimization 

It appears that computationally efficient simulation-driven design can be per- 
formed using surrogate models. Microwave design through surrogate-based opti- 
mization (SBO) [7], [30], [31] is the main focus of this chapter. Surrogate-based 
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methods are treated in some detail in Chapter 3. Here, only some background 
information is presented. The primary reason for using SBO approach in micro- 
wave engineering is to speed up the design process by shifting the optimization 
burden to an inexpensive yet reasonably accurate surrogate model of the device. 

The generic SBO framework described here that the direct optimization of the 
computationally expensive EM-simulated high-fidelity model R f is replaced by an 
iterative procedure [7], [32] 

jc <,+1) =argmin£/(flf(x)) (8.2) 

that generates a sequence of points (designs) jc (,) e X f , i = 0, 1, ..., being approxi- 
mate solutions to the original design problem (1). Each x {,+V) is the optimal design 
of the surrogate model R s (i) : X® -» R m , X® qR", i = 0, 1, ... . R s (i) is assumed to 
be a computationally cheap and sufficiently reliable representation of the fine 
model Rf, particularly in the neighborhood of the current design x ( '\ Under these 
assumptions, the algorithm (8.2) is likely to produce a sequence of designs that 
quickly approach x f . 

Typically, R f is only evaluated once per iteration (at every new design jc (,+1) ) for 
verification purposes and to obtain the data necessary to update the surrogate 
model. Since the surrogate model is computationally cheap, its optimization cost 
(cf. (2)) can usually be neglected and the total optimization cost is determined by 
the evaluation of Rf. The key point here is that the number of evaluations of R f for 
a well performing surrogate-based algorithm is substantially smaller than for any 
direct optimization method (e.g., gradient-based one) [9]. Figure 8.2 shows the 
block diagram of the SBO optimization process. 

If the surrogate model satisfies zero- and first-order consistency conditions with 
the fine model, i.e., R s (i) (x (i) ) = R f (x m ) and (dR} r> /dx)(x ii ) = (dR/dx)(x (i ) (verifica- 
tion of the latter requires Rf sensitivity data), and the algorithm (2) is enhanced by 
the trust region method [33], then it is provably convergent to a local fine model op- 
timum [34]. Convergence can also be guaranteed if the algorithm (2) is enhanced by 
properly selected local search methods [35]. Space mapping [7], [30], [36], [37], is 
an example of a surrogate-based methodology that does not normally rely on the 
aforementioned enhancements; however, it requires the surrogate model to be con- 
structed from the physically-based coarse model [7]. This usually gives remarkably 
good performance in the sense of the space mapping algorithm being able to quickly 
locate a satisfactory design. Unfortunately space mapping suffers from convergence 
problems [38] and it is sensitive to the quality of the coarse model and the type of 
transformations used to create the surrogate [39]. 
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Fig. 8.2 Surrogate-based simulation-driven design optimization: the optimization burden is 
shifted to the computationally cheap surrogate model which is updated and re-optimized at 
each iteration of the main optimization loop. High-fidelity EM simulation is only performed 
once per iteration to verify the design produced by the surrogate model and to update the sur- 
rogate itself. The number of iterations for a well-performing SBO algorithm is substantially 
smaller than for conventional techniques. 



8.4 Surrogate Models for Microwave Engineering 

There are a number of ways to create surrogate models of microwave and radio- 
frequency (RF) devices and structures. They can be classified into two groups: 
functional and physical surrogates. Functional models are constructed from sam- 
pled high-fidelity model data using suitable function approximation techniques. 
Physical surrogates exploit fast but limited-accuracy models that are physically re- 
lated to the original structure under consideration. 

Functional surrogate models can be created using various function approxima- 
tion techniques including low-order polynomials [40], radial basis functions [40], 
kriging [31], fuzzy systems [41], support- vector regression [42], [43], and neural 
networks [44]-[46], the last one probably being the most popular and successful 
approach in this group. Approximation models are very fast, unfortunately, to 
achieve good modeling accuracy, a large amount of training data obtained through 
massive EM simulations is necessary. Moreover, the number of data pairs neces- 
sary to ensure sufficient accuracy grows exponentially with the number of the de- 
sign variables. Practical models based on function approximation techniques may 
need hundreds or even thousands of EM simulations in order to ensure reasonable 
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accuracy. This is justified in the case of library models created for multiple usage 
but not so much in the case of ad hoc surrogates created for specific tasks such as 
parametric optimization, yield-driven design, and/or statistical analysis at a given 
(e.g., optimal) design. 

Physical surrogates are based on underlying physically-based low-fidelity mod- 
els of the structure of interest (denoted here as R c ). Physically-based models de- 
scribe the same physical phenomena as the high-fidelity model, however, in a 
simplified manner. In microwave engineering, the high-fidelity model describes 
behavior of the system in terms of the distributions of the electric and magnetic 
fields within (and, sometimes in its surrounding) that are calculated by solving the 
corresponding set of Maxwell equations [47]. Furthermore, the system perform- 
ance is expressed through certain characteristics related to its input/output ports 
(such as so-called ^-parameters [47]). All of these are obtained as a result of high- 
resolution electromagnetic simulation where the structure under consideration is 
finely discretized. In this context, the physically-based low-fidelity model of the 
microwave device can be obtained through: 

• Analytical description of the structure using theory-based or semi-empirical 
formulas, 

• Different level of physical description of the system. The typical example in 
microwave engineering is equivalent circuit [7], where the device of interest 
is represented using lumped components (inductors, capacitors, microstrip 
line models, etc.) with the operation of the circuit described directly by im- 
pedances, voltages and currents; electromagnetic fields are not directly 
considered, 

• Low-fidelity electromagnetic simulation. This approach allows us to use the 
same EM solver to evaluate both the high- and low-fidelity models, however, 
the latter is using much coarser simulation mesh which results in degraded 
accuracy but much shorter simulation time. 

The three groups of models have different characteristics. While analytical and 
equivalent-circuit models are computationally cheap, they may lack accuracy and 
they are typically not available for structures such as antennas and substrate- 
integrated circuits. On the other hand, coarsely-discretized EM models are available 
for any device. They are typically accurate, however, relatively expensive. The cost 
is a major bottleneck in adopting coarsely-discretized EM models to surrogate-based 
optimization in microwave engineering. One workaround is to build a function- 
approximation model using coarse-discretization EM-simulation data (using, e.g., 
kriging [31]). This, however, requires dense sampling of the design space, and 
should only be done locally to avoid excessive CPU cost. Table 8.1 summarizes the 
characteristics of the low-fidelity models available in microwave engineering. A 
common feature of physically-based low-fidelity models is that the amount of 
high-fidelity model data necessary to build a reliable surrogate model is much 
smaller than in case of functional surrogates [48]. 
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Table 8.1 Physically-based low-fidelity models in microwave engineering 



Model Type 



CPU Cost 



Accuracy 



Availability 



Analytical 


Very cheap 


Low 


Rather limited 


Equivalent circuit 


Cheap 


Decent 


Limited (mostly filters) 


Coarsely-discretized 
EM simulation 


Expensive 


Good to very 
good 


Generic: available for all 
structures 



Consider an example microstrip bandpass filter [48] shown in Fig. 8.3(a). The 
high-fidelity filter model is simulated using EM solver FEKO [18]. The low- 
fidelity model is an equivalent circuit implemented in Agilent ADS [49] 
(Fig. 8.3(b)). Figure 8.4(a) shows the responses (here, the modulus of transmission 
coefficient, \S 2 [\, versus frequency) of both models at certain reference design x (0 \ 
While having similar shape, the responses are severely misaligned. Figure 8.4(b) 
shows the responses of the high-fidelity model and the surrogate constructed using 
the low-fidelity model and space mapping [48]. The surrogate is build using a sin- 
gle training point - high-fidelity model data at jc (0) - and exhibits very good 
matching with the high-fidelity model at jc <0) . Figure 8.4(c) shows the high-fidelity 
and surrogate model response at a different design: the good alignment between 
the models is still maintained. This comes from the fact that the physically-based 
low-fidelity model has similar properties to the high-fidelity one and local model 
alignment usually results in relatively good global matching. 
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Fig. 8.3 Microstrip bandpass filter [48]: (a) geometry, (b) low- fidelity circuit model. 
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surrogate model was constructed using a single high-fidelity model response (at x <0> ) but a 
good matching between the models is preserved even away from the reference design, 
which is due to the fact that the low-fidelity model is physically based. 
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8.5 Microwave Simulation-Driven Design Exploiting 
Physically-Based Surrogates 

In this section several techniques for computationally efficient simulation-driven 
design of microwave structures are presented. The focus is on approaches that 
exploit the SBO framework (8.2) and the surrogate model constructed using an 
underlying physically-based low-fidelity model. Discussion covers the following 
methods: space mapping [7], [30], simulation-based tuning [50], shape-preserving 
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response prediction [51], variable-fidelity optimization [52], as well as optimiza- 
tion through adaptively adjusted design specifications [53]. 



8.5.1 Space Mapping 

Space mapping (SM) [7], [30] is probably one of the most recognized SBO tech- 
niques using physically-based low-fidelity (or coarse) models in microwave engi- 
neering. Space mapping exploits the algorithm (8.2) to generate a sequence of ap- 
proximate solutions jc (,) , i = 0, 1, 2, ..., to problem (8.1). The surrogate model at 
iteration i, R®, is constructed from the low-fidelity model so that the misalign- 
ment between R S U) and the fine model is minimized using so-called parameter ex- 
traction process, which is the nonlinear minimization problem by itself [7]. The 
surrogate is defined as [30] 

R^(x) = R sg (x,p (,} ) (8.3) 

where R sg is a generic space mapping surrogate model, i.e., the low-fidelity model 
composed with suitable transformations, whereas 

p (i) =argnrin£i. w u \\R f (x {k ')-R sg (x (t \p)\\ (8-4) 

is a vector of model parameters and w Lk are weighting factors; a common choice 
of w Lk is w ik = 1 for all i and all k. 

Various space mapping surrogate models are available [7], [30]. They can be 
roughly categorized into four groups: (i) Models based on a (usually linear) distor- 
tion of coarse model parameter space, e.g., input space mapping of the form 
R s . g (x, p) = R s . g (x, B, c) = R c (B-x + c) [7] ; (ii) Models based on a distortion of the 
coarse model response, e.g., output space mapping of the form 
R s . g (x, p) = R s . g (x, d) = R c (x) + d [30]; (iii) Implicit space mapping, where the pa- 
rameters used to align the surrogate with the fine model are separate from the de- 
sign variables, i.e., R s , g {x, p) = R s . g (x, x p ) = R c j(x, x p ), with R ci being the coarse 
model dependent on both the design variables jc and so-called preassigned parame- 
ters x p (e.g., dielectric constant, substrate height) that are normally fixed in the 
fine model but can be freely altered in the coarse model [30]; (iv) Custom models 
exploiting parameters characteristic to a given design problem; the most character- 
istic example is the so-called frequency space mapping 
Rs.g(x,p) = R s . g {x,F)=R C j(x,F) [7], where R cf is a frequency-mapped coarse 
model, i.e., the coarse model evaluated at frequencies different from the original 
frequency sweep for the fine model, according to the mapping <y— >/i +f2<X>, with 

F=\f i f 2 ] T . 

Space mapping usually comprises combined transformations. At instance, a 
surrogate model employing input, output, and frequency SM transformations 
would be R s , g (x, p) = R s . g (x, c, d, F) = R c ./x + c, F) + d. The rationale for this is 
that a properly chosen mapping may significantly improve the performance of the 
space mapping algorithm, however, the optimal selection of the mapping type for 
a given design problem is not trivial [38]. Work has been done to ease the 
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selection process for a given design problem [39], [48]. However, regardless of 
the mapping choice, coarse model accuracy is what principally affects the per- 
formance of the space mapping design process. One can quantify the quality of the 
surrogate model through rigorous convergence conditions [38]. These conditions, al- 
though useful for developing more efficient space mapping algorithms and auto- 
matic surrogate model selection techniques, cannot usually be verified because of 
the limited amount of data available from the fine model. In practice, the most im- 
portant criterion for assessing the quality or accuracy of the coarse model is still vis- 
ual inspection of the fine and coarse model responses at certain points and/or exam- 
ining absolute error measures such as \\R/x) -R c (x)\\. 

The coarse model is the most important factor that affects the performance of 
the space mapping algorithm. The first stems from accuracy. Coarse model accu- 
racy (more generally, the accuracy of the space mapping surrogate [38]) is the 
main factor that determines the efficiency of the algorithm in terms of finding a 
satisfactory design. The more accurate the coarse model, the smaller the number 
of fine model evaluations necessary to complete the optimization process. If the 
coarse model is insufficiently accurate, the space mapping algorithm may need 
more fine model evaluations or may even fail to find a good quality design. 

The second important characteristic is the evaluation cost. It is essential that the 
coarse model is computationally much cheaper than the fine model because both 
parameter extraction (8.4) and surrogate optimization (8.2) require large numbers 
of coarse model evaluations. Ideally, the evaluation cost of the coarse model 
should be negligible when compared to the evaluation cost of the fine model, in 
which case the total computational cost of the space mapping optimization process 
is merely determined by the necessary number of fine model evaluations. If the 
evaluation time of the coarse model is too high, say, larger than 1% of the fine 
model evaluation time, the computational cost of surrogate model optimization 
and, especially, parameter extraction, start playing important roles in the total cost 
of space mapping optimization and may even determine it. Therefore, practical ap- 
plicability of space mapping is limited to situations where the coarse model is com- 
putationally much cheaper than the fine model. Majority of SM models reported in 
the literature (e.g., [7], [30], [36]) concern microstrip filters, transformers or junc- 
tions where fast and reliable equivalent circuit coarse models are easily available. 

8.5.2 Simulation-Based Tuning and Tuning Space Mapping 

Tuning is ubiquitous in engineering practice. It is usually associated with the 
process of manipulating free or tunable parameters of a device or system after that 
device or system has been manufactured. The traditional purpose of permitting 
tunable elements is (8.1) to facilitate user- flexibility in achieving a desired 
response or behavior from a manufactured outcome during its operation, or (8.2) 
to correct inevitable postproduction manufacturing defects, small due perhaps to 
tolerances, or large due perhaps to faults in the manufacturing process [54]. 
Tuning of an engineering design can be seen, in essence, as a user- or robot- 
directed optimization process. 
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Tuning space mapping (TSM) [50] combines the concept of tuning, widely 
used in microwave engineering [55], [56], and space mapping. It is an iterative op- 
timization procedure that assumes the existence of two surrogate models: both are 
less accurate but computationally much cheaper than the fine model. The first 
model is a so-called tuning model R, that contains relevant fine model data (typi- 
cally a fine model response) at the current iteration point and tuning parameters 
(typically implemented through circuit elements inserted into tuning ports). The 
tunable parameters are adjusted so that the model R, satisfies the design specifica- 
tions. The second model, R c is used for calibration purposes: it allows us to trans- 
late the change of the tuning parameters into relevant changes of the actual design 
variables; R c is dependent on three sets of variables: design parameters, tuning pa- 
rameters (which are actually the same parameters as the ones used in R t ), and SM 
parameters that are adjusted using the usual parameter extraction process [7] in 
order to have the model R c meet certain matching conditions. Typically, the model 
R c is a standard SM surrogate (i.e., a coarse model composed with suitable trans- 
formations) enhanced by the same or corresponding tuning elements as the model 
R t . The conceptual illustrations of the fine model, the tuning model and the cali- 
bration model are shown in Fig. 8.5. 

The iteration of the TSM algorithm consists of two steps: optimization of the 
tuning model and a calibration procedure. First, the current tuning model R, is 
built using fine model data at point x . In general, because the fine model with in- 
serted tuning ports is not identical to the original structure, the tuning model re- 
sponse may not agree with the response of the fine model at x (,) even if the values 
of the tuning parameters x t are zero, so that these values must be adjusted to, say, 
jc t o W , in order to obtain alignment [50]: 

xfl = zxgrmn\R f {x (i) )-R?\x,)\ ( 8 -5) 

X{ II J II 

In the next step, one optimizes R, (,) to have it meet the design specifications. Op- 
timal values of the tuning parameters jc t i are obtained as follows: 

xfl = arg min U (fl, <0 (x, )) (8-6) 

Having jc tl , the calibration procedure is performed to determine changes in the 
design variables that yield the same change in the calibration model response as 
that caused by X,,i Xt.o [50]. First one adjusts the SM parameters p {l) of the 
calibration model to obtain a match with the fine model response at jc (,) 

p i,] =mgxmn\RAx ( ' ) )-R c (x { '\p,x\ i) A. (8-7) 

P ii -ii 

The calibration model is then optimized with respect to the design variables in or- 
der to obtain the next iteration point jc (,+1) 

x m) =&ginm\\R®(x%)-R c (x,p ii) ,x%)\\. ( 8 -8) 

x II II 

Note that x t ,o is used in (8.7), which corresponds to the state of the tuning model 
after performing the alignment procedure (8.5), and jc,/'' in (8.8), which 
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corresponds to the optimized tuning model (cf. (6)). Thus, (8.7) and (8.8) allow 
finding the change of design variable values x u+l> - x {,) necessary to compensate 
the effect of changing the tuning parameters from jc, W to x t i . 
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Fig. 8.5 Conceptual illustrations of the fine model, the tuning model and the calibration 
model: (a) the fine model is typically based on full-wave simulation, (b) the tuning model 
exploits the fine model "image" (e.g., in the form of 5-parameters corresponding to the cur- 
rent design imported to the tuning model using suitable data components) and a number of 
circuit-theory-based tuning elements, (c) the calibration model is usually a circuit equiva- 
lent dependent on the same design variables as the fine model, the same tuning parameters 
as the tuning model and, additionally, a set of space mapping parameters used to align the 
calibration model with both the fine and the tuning model during the calibration process. 



It should be noted that the calibration procedure described here represents the 
most generic approach. In some cases, there is a formula that establishes an ana- 
lytical relation between the design variables and the tuning parameters so that the 
updated design can be found simply by applying that formula [50]. In particular, 
the calibration formula may be just a linear function so that 
jc (,+1) = x + S *(Xti —x t .o ), where s is a real vector and * denotes a Hadamard 
product (i.e., component- wise multiplication) [50]. If the analytical calibration is 
possible, there is no need to use the calibration model. Other approaches to the 
calibration process can be found in the literature [50], [57]. In some cases (e.g., 
[57]), the tuning parameters may be in identity relation with the design variables, 
which simplified the implementation of the algorithm. 
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The operation of the tuning space mapping algorithm can be clarified using a 
simple example of a microstrip transmission line [50]. The fine model is imple- 
mented in Sonnet em [17] (Fig. 8.6(a)), and the fine model response is taken as the 
inductance of the line as a function of the line's length. The original length of the 
line is chosen to be jc <0) = 400 mil with a width of 0.635 mm. The goal is to find a 
length of line such that the corresponding inductance is 6.5 nH at 300 MHz. The 
Sonnet em simulation at jc (0) gives the value of 4.38 nH, i.e., R-f(x {0) ) = 4.38 nH. 
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Fig. 8.6 TSM optimization of the microstrip line [50] : (a) original structure of the micro- 
strip line in Sonnet, (b) the microstrip line after being divided and with inserted the co- 
calibrated ports, (c) tuning model, (d) calibration model. 



The tuning model R, is developed by dividing the structure in Fig. 8.6(a) into two 
separate parts and adding the two tuning ports as shown in Fig. 8.6(b). A small induc- 
tor is then inserted between these ports as a tuning element. The tuning model is im- 
plemented in Agilent ADS [47] and shown in Fig. 8.6(c). The model contains the fine 
model data at the initial design in the form of the S4P element as well as the tuning 
element (inductor). Because of Sonnet's co-calibrated ports technology [56], there is 
a perfect agreement between the fine and tuning model responses when the value of 
the tuning inductance is zero, so that JC t o <0> is zero in this case. 

Next, the tuning model should be optimized to meet the target inductance of 6.5 
nH. The optimized value of the tuning inductance is x, A (0) = 2.07 nH. 

The calibration model is shown in Fig. 8.6(d). Here, the dielectric constant of the 
microstrip element is used as a space mapping parameter/?. Original value of this 
parameter, 9.8, is adjusted using (8.7) to 23.7 so that the response of the calibration 
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model is 4.38 nH at 400 mil, i.e., it agrees with the fine model response at x (0) . 
Now, the new value of the microstrip length is obtained using (8.8). In particular, 
one optimizes x with the tuning inductance set to x, . (0) = nH to match the total in- 
ductance of the calibration model to the optimized tuning model response, 6.5 nH. 
The result is jc (1) = 585.8 mil; the fine model response at jc (1) obtained by Sonnet em 
simulation is 6.48 nH. This result can be further improved by performing a second 
iteration of the TSM, which gives the length of the microstrip line equal to 
jc (2) = 588 mil and its corresponding inductance of 6.5 nH. 

Simulation-based tuning and tuning space mapping can be extremely efficient 
as demonstrated in Chapter 12. In particular, a satisfactory design can be obtained 
after just one or two iterations. However, the tuning methodology has limited 
applications. It it well suited for structures such as microstrip filters but it can 
hardly be applied for radiating structures (antennas). Also, tuning of cross- 
sectional parameters (e.g., microstrip width) is not straightforward [50]. On the 
other hand, the tuning procedure is invasive in the sense that the structure may 
need to be cut. The fine model simulator must allow such cuts and allow tuning 
elements to be inserted. This can be done using, e.g., Sonnet em [17]. Also, EM 
simulation of a structure containing a large number of tuning ports is computation- 
ally far more expensive than the simulation of the original structure (without the 
ports). Depending on the number of design variables, the number of tuning ports 
may be as large as 30, 50 or more [50], which may increase the simulation time by 
one order of magnitude or more. Nevertheless, recent results presented in [58] indi- 
cate possibility of speeding up the tuning process by using so-called reduced 
structures. 



8.5.3 Shape-Preserving Response Prediction 

Shape-preserving response prediction (SPRP) [51] is a response correction tech- 
nique that takes advantage of the similarity between responses of the high- and 
low-fidelity models in a very straightforward way. SPRP assumes that the change 
of the high-fidelity model response due to the adjustment of the design variables 
can be predicted using the actual changes of the low-fidelity model response. 
Therefore, it is critically important that the low-fidelity model is physically based, 
which ensures that the effect of the design parameter variations on the model re- 
sponse is similar for both models. In microwave engineering this property is likely 
to hold, particularly if the low-fidelity model is the coarsely-discretization struc- 
ture evaluated using the same EM solver as the one used to simulate the high- 
fidelity model. 

The change of the low-fidelidy model response is described by the translation 
vectors corresponding to a certain (finite) number of characteristic points of the 
model's response. These translation vectors are subsequently used to predict the 
change of the high-fidelity model response with the actual response of R f at the 
current iteration point, R/x ), treated as a reference. 

Figure 8.7(a) shows the example low-fidelity model response, 1521 1 m the fre- 
quency range 8 GHz to 18 GHz, at the design x { '\ as well as the low-fidelity model 
response at some other design jc. The responses come from the double folded stub 
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bandstop filter example considered in [51]. Circles denote characteristic points of 
R c (x (,) ), selected here to represent IS21I = -3 dB, IS21I = -20 dB, and the local IS 2 il 
maximum (at about 13 GHz). Squares denote corresponding characteristic points 
for R c (x), while line segments represent the translation vectors ("shift") of the 
characteristic points of R c when changing the design variables from jc (,) to x. Since 
the low-fidelity model is physically based, the high-fidelity model response at the 
given design, here, x, can be predicted using the same translation vectors applied 
to the corresponding characteristic points of the high-fidelity model response at 
jc (,) , Rj(x (i) ). This is illustrated in Fig. 8.7(b). 

Rigorous formulation of SPRP uses the following notation concerning the re- 
sponses: Rj(x) = [R/(x,C0[) ... Rfx.aQf and R c (x) = [R c (x,a>i) ... R c (x,OJi n )] T , where 
COj, j= 1, ...,m, is the frequency sweep. Let pf= [ccj rf] T , pf = [of rf] T , and 
pf = [of rf] 1 , j = 1, ..., K, denote the sets of characteristic points of Rfix^), R c (x (,) ) 
and R c (x), respectively. Here, 0) and r denote the frequency and magnitude compo- 
nents of the respective point. The translation vectors of the low-fidelity model re- 

hT ■ , 17 ...1 ,.» „c ,,d) j t c d) 



sponse are defined as tj = [of rf\ ,j= 1,...,K, where of = of - of and rj 



-r, 



The shape-preserving response prediction surrogate model is defined as follows 
Rf(x) = \Rf(x,o\) ... R ( ;\x,w m )f (8.9) 



where 



/?f(x,^) = J R / ,(A: < ", J F(^,{-4}L)) + W(^,{r;}f =1 ) (8.10) 



for j = 1, ..., m. Rfj(x,CO\) is an interpolation of {Rfx,CO\), ..., Rfx,CQ n )} onto the 
frequency interval [ o\ , &y . 

The scaling function F interpolates the data pairs {a>i,OJ[}, {co{,OX-COi}, ..., 
{ COK^OJfc-OJfc'}, { OJt^OJjn}, onto the frequency interval [0J i ,C0 m ]. The function R does a 
similar interpolation for data pairs {C0i,r x }, {oj/,r/-ri}, ..., {<%V/-r/}, {ft^,,r m }; 
here r { = R c (x,Ct\) - R c (x\o\) and r m = R c (x,CO ln ) - R c (x r ,OJj„). In other words, the 
function F translates the frequency components of the characteristic points of 
Rf(x (,> ) to the frequencies at which they should be located according to the transla- 
tion vectors tj, while the function R adds the necessary magnitude component. 

It should be emphasized that shape-preserving response prediction a physically- 
based low-fidelity model is critical for the method's performance. On the other 
hand, SPRP can be characterized as a non-parametric, nonlinear and design-variable 
dependent response correction, and it is therefore distinct from any known space 
mapping approaches. Another important feature that differentiates SPRP from 
space mapping and other approaches (e.g., tuning) is implementation simplicity. 
Unlike space mapping, SPRP does not use any extractable parameters (which are 
normally found by solving a separate nonlinear minimization problem), the prob- 
lem of the surrogate model selection [38], [39] (i.e., the choice of the transforma- 
tion and its parameters) does not exist, and the interaction between the models is 
very simple (only through the translation vectors (8.3), (8.4)). Unlike tuning 
methodologies, SPRP does not require any modification of the optimized structure 
(such as "cutting" and insertion of the tuning components [50]). 
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Fig. 8.7 SPRP concept: (a) Example low-fidelity model response at the design x i,} , R L .(x'' ) ) 
(solid line), the low-fidelity model response at x, R c (x) (dotted line), characteristic points of 
R^x^) (circles) and R c (x) (squares), and the translation vectors (short lines); (b) High- 
fidelity model response at x u \ R^x 1 '') (solid line) and the predicted high-fidelity model response 
atx (dotted line) obtained using SPRP based on characteristic points of Fig. 8.1(a); characteristic 
points ofR^x 10 ) (circles) and the translation vectors (short lines) were used to find the character- 
istic points (squares) of the predicted high-fidelity model response; low-fidelity model responses 
R c (x ) and R c (x) are plotted using thin solid and dotted fine, respectively [51]. 



If one-to-one correspondence between the characteristic points of the high- and 
low-fidelity model is not satisfied despite use of the coarse-mesh EM-based low- 
fidelity model, the sets of corresponding characteristic points can be generated 
based not on distinctive features of the responses (e.g., characteristic response lev- 
els or local minima/maxima) but by introducing additional points that are equally 
spaced in frequency and inserted between well defined points [51]. These addi- 
tional points not only ensure that the shape-preserving response prediction model 
(8.3), (8.4) is well defined but also allows us to capture the response shape of the 
models even though the number of distinctive features (e.g., local maxima and 
minima) is different for high- and low-fidelity models. 
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8.5.4 Multi-fidelity Optimization Using Coarse-Discretization 
EM Models 



As mentioned in Section 8.4, the most versatile type of physically-based low- 
fidelity model in microwave engineering is the one obtained through EM simula- 
tion of coarsely-discretized structure of interest. The computational cost of the 
model and its accuracy can be easily controlled by changing the discretization 
density. This feature has been exploited in the multi-fidelity optimization algo- 
rithm introduced in [52]. 

The design optimization methodology of [52] is based on a family of coarse- 
discretization models {R c j},j = 1,. . ., K, all evaluated by the same EM solver as the 
one used for the high-fidelity model. Discretization of the model R c .j+\ is finer than 
that of the model R c j, which results in better accuracy but also longer evaluation 
time. In practice, the number of coarse-discretization models is two or three. 

Having the optimized design x (K> of the last (and finest) coarse-discretization 



model R cK , the model is evaluated at all perturbed designs around x , i.e 



»(*) 



[x; w ... x k w + sign(k)-d k ... x n w ], k = -n, -n+\, ..., n-\, n. A notation of R w = 
Rcufx^) i s adopted here. This data can be used to refine the final design without di- 
rectly optimizing R f . Instead, an approximation model involving R {k> is set up and 
optimized in the neighborhood of x {K) defined as [x {K) - d, x + d], where d = [di dj 
. . . d n ] T . The size of the neighborhood can be selected based on sensitivity analysis of 
R c j (the cheapest of the coarse-discretization models); usually d equals 2 to 5 per- 
cent of x (K) . 

Here, the approximation is performed using a reduced quadratic model q{x) = 
[q { q 2 ... q m ] T , defined as 



q J (x)-q J ([x 1 ...x n ] T )- 



: ^.oH. 



x, +. 



-X. x + X. J.X 



-X ., x 2 

jln n 



(8.11) 



Coefficients Xj n j = 1, . .., m, r = 0, 1, 
the linear regression problems 



2m, can be uniquely obtained by solving 
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(8.12) 



where x k j K) is a jth component of the vector x k , and Rj is a jth component of 
the vector R (k \ i.e., 

In order to account for unavoidable misalignment between R cK and Rf, instead 
of optimizing the quadratic model q, it is recommended to optimize a corrected 
model q(x) + [Rj(x ) - R cK (x (K> )] that ensures a zero-order consistency [34] be- 
tween R cK and R f . The refined design can be then found as 



x iK) -d<x<x {K] +d 



-lR f (x (K) )-R^(x (K> )]) 



(8.13) 
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This kind of correction is also known as output space mapping [30]. If necessary, 
the step (8.4) can be performed a few times starting from a refined design, i.e., 
jc* = argmin{jc ( *° - d < x < x {K} + d : U(q(x) + [R/x*) - R c .k(x*)])} (each iteration 
requires only one evaluation of Rf). 

The design optimization procedure can be summarized as follows (input argu- 
ments are: initial design x {0) and the number of coarse-discretization models K): 

1. Set/=1; 

2. Optimize coarse-discretization model R cj to obtain a new design x w using 
jc 17-1 ' as a starting point; 

3. Sety'=y'+ 1 ; if/ < £ go to 2; 

4. Obtain a refined design jc as in (8.13); 

5. END; 

Note that the original model R f is only evaluated at the final stage (step 4) of the 
optimization process. Operation of the algorithm in illustrated in Fig. 8.8. Coarse- 
discretization models can be optimized using any available algorithm. 




Fig. 8.8 Operation of the multi-fidelity design optimization procedure for K = 3 (three coarse- 
discretization models). The design x is obtained as the optimal solution of the model R c j, 
j =1,2,3. A reduced second-order approximation model q is set up in the neighborhood of 
X ' (gray area) and the final design x is obtained by optimizing a reduced q as in (8.4). 



Typically, the major difference between the responses of R f and coarse- 
discretization models R cj is that they are shifted in frequency. This difference can 
be easily absorbed by frequency-shifting the design specifications while optimiz- 
ing a model R CI . More specifically, suppose that the design specifications are de- 
scribed as { CDti, (Ot.H, s k ), k = 1, ..., n s , (e.g., specifications \S 2 \\ ^ -3 dB for 3 GHz 

< 0)< 4 GHz, \S 2l \ < -20 dB for 1 GHz < 0)< 2 GHz and IS 21 I < -20 dB for 5 GHz 

< 0)< 7 GHz would be described as {3, 4; -3}, {1, 2; -20}, and {5, 7; -20}). If the 
average frequency shift between responses of R c ,j and R c j+i is Aco, this difference 
can be absorbed by modifying the design specifications to { ft)t L - Aco, 0\n - Aco, 
sic}, k = 1, ..., n s . 

As mentioned above, the number K of coarse-discretization models is typically 
two or three. The first coarse-discretization model R cA should be set up so that its 
evaluation time is at least 30 to 100 times shorter than the evaluation time of the 
fine model. The reason is that the initial design may be quite poor so that the ex- 
pected number of evaluations of R cl is usually large. By keeping R cl fast, one can 
control the computational overhead related to its optimization. Accuracy of R c , is 
not critical because its optimal design is only supposed to give a rough estimate of 
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the fine model optimum. The second (and, possibly third) coarse-discretization 
model should be more accurate but still at least about 10 times faster than the fine 
model. This can be achieved by proper manipulation of the solver mesh density. 

8.5.5 Optimization Using Adoptively Adjusted Design 
Specifications 

The techniques described in Section 8.5.1 to 8.5.4 aimed at correcting the low- 
fidelity model so that it becomes, at least locally, an accurate representation of the 
high-fidelity model. An alternative way of exploiting low-fidelity models in simu- 
lation-driven design of microwave structures is to modify the design specifications 
in such a way that the updated specifications reflect the discrepancy between the 
models. This approach is extremely simple to implement because no changes of 
the low-fidelity model are necessary. 

The adaptively adjusted design specifications optimization procedure intro- 
duced in [53] consists of the following two simple steps that can be iterated if 
necessary: 

1. Modify the original design specifications in order to take into account the 
difference between the responses of R f and R c at their characteristic points. 

2. Obtain a new design by optimizing the coarse model with respect to the 
modified specifications. 

Characteristic points of the responses should correspond to the design specifica- 
tion levels. They should also include local maxima/minima of the respective re- 
sponses at which the specifications may not be satisfied. Figure 8.9(a) shows fine 
and coarse model response at the optimal design of R c , corresponding to the band- 
stop filter example considered in [53]; design specifications are indicated using 
horizontal lines. Figure 8.9(b) shows characteristic points of R f and R c for the 
bandstop filter example. The points correspond to -3 dB and -30 dB levels as well 
to the local maxima of the responses. As one can observe in Fig. 8.9(b) the selec- 
tion of points is rather straightforward. 

In the first step of the optimization procedure, the design specifications are modi- 
fied (or mapped) so that the level of satisfying/violating the modified specifications 
by the coarse model response corresponds to the satisfaction/violation levels of the 
original specifications by the fine model response. 

More specifically, for each edge of the specification line, the edge frequency is 
shifted by the difference of the frequencies of the corresponding characteristic 
points, e.g., the left edge of the specification line of -30 dB is moved to the right 
by about 0.7 GHz, which is equal to the length of the line connecting the corre- 
sponding characteristic points in Fig. 8.9(b). Similarly, the specification levels are 
shifted by the difference between the local maxima/minima values for the respec- 
tive points, e.g., the -30 dB level is shifted down by about 8.5 dB because of the 
difference of the local maxima of the corresponding characteristic points of Rf and 
R c . Modified design specifications are shown in Fig. 8.9(c). 

The coarse model is subsequently optimized with respect to the modified speci- 
fications and the new design obtained this way is treated as an approximated 
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solution to the original design problem (i.e., optimization of the fine model with 
respect to the original specifications). Steps 1 and 2 (listed above) can be repeated 
if necessary. Substantial design improvement is typically observed after the first 
iteration, however, additional iterations may bring further enhancement [53]. 
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Fig. 8.9 Bandstop filter example (responses of R f and R c are marked with solid and dashed 
line, respectively): (a) fine and coarse model responses at the initial design (optimum of R c ) 
as well as the original design specifications, (b) characteristic points of the responses corre- 
sponding to the specification levels (here, -3 dB and -30 dB) and to the local response 
maxima, (c) fine and coarse model responses at the initial design and the modified design 
specifications. 
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In the first step of the optimization procedure, the design specifications are 
modified (or mapped) so that the level of satisfying/violating the modified specifi- 
cations by the coarse model response corresponds to the satisfaction/violation lev- 
els of the original specifications by the fine model response. It is assumed that the 
coarse model is physically-based, in particular, that the adjustment of the design 
variables has similar effect on the response for both Rf and R c . In such a case the 
coarse model design that is obtained in the second stage of the procedure (i.e., op- 
timal with respect to the modified specifications) will be (almost) optimal for Rf 
with respect to the original specifications. As shown in Fig. 8.9, the absolute 
matching between the models is not as important as the shape similarity. 

In order to reduce the overhead related to coarse model optimization (step 2 of 
the procedure) the coarse model should be computationally as cheap as possible. 
For that reason, equivalent circuits or models based on analytical formulas are pre- 
ferred. Unfortunately, such models may not be available for many structures in- 
cluding antennas, certain types of waveguide filters and substrate integrated cir- 
cuits. In all such cases, it is possible to implement the coarse model using the 
same EM solver as the one used for the fine model but with coarser discretization. 
To some extent, this is the easiest and the most generic way of creating the coarse 
model. Also, it allows a convenient adjustment of the trade-off between the quality 
of R c (i.e., the accuracy in representing the fine model) and its computational cost. 
For popular EM solvers (e.g., CST Microwave Studio [9], Sonnet em [17], FEKO 
[18]) it is possible to make the coarse model 20 to 100 faster than the fine model 
while maintaining accuracy that is sufficient for the method SPRP. 

When compared to space mapping and tuning, the adaptively adjusted design 
specifications technique appears to be much simpler to implement. Unlike space 
mapping, it does not use any extractable parameters (which are normally found by 
solving a separate nonlinear minimization problem), the problem of the surrogate 
model selection [38], [39] (i.e., the choice of the transformation and its parameters) 
does not exist, and the interaction between the models is very simple (only through 
the design specifications). Unlike tuning methodologies, the method presented in this 
section does not require any modification of the optimized structure (such as "cut- 
ting" and insertion of the tuning components [50]). The lack of extractable parame- 
ters is its additional advantage compared to some other approached (e.g., space map- 
ping) because the computational overhead related to parameter extraction, while 
negligible for very fast coarse model (e.g., equivalent circuit), may substantially in- 
crease the overall design cost if the coarse model is relatively expensive (e.g., imple- 
mented through coarse-discretization EM simulation). 

If the similarity between the fine and coarse model response is not sufficient the 
adaptive design specifications technique may not work well. In many cases, how- 
ever, using different reference design for the fine and coarse models may help. In 
particular, R c can be optimized with respect to the modified specifications starting 
not from jc <0) (the optimal solution of R c with respect to the original specifications), 
but from another design, say x r <0> , at which the response of R c is as similar to the re- 
sponse of Rf at x (0> as possible. Such a design can be obtained as follows [7]: 
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x[ 0) = argmin\\R f (x i0) )-R(z)\\ (8.14) 



V 



At iteration i of the optimization process, the optimal design of the coarse model 
R c with respect to the modified specifications, jc c (,) , has to be translated to the cor- 
responding fine model design, x ( '\ as follows jc (,) = jc c (,) + (jc <0> - jc c <0) ). Note that the 
preconditioning procedure (8.14) is performed only once for the entire optimiza- 
tion process. The idea of coarse model preconditioning is borrowed from space 
mapping (more specifically, from the original space mapping concept [7]). In prac- 
tice, the coarse model can be "corrected" to reduce its misalignment with the fine 
model using any available degrees of freedom, for example, preassigned parameters 
as in implicit space mapping [33]. 

8.6 Summary 

Simulation-driven optimization has become an important design tool in contempo- 
rary microwave engineering. Its importance is expected to grow in the future due 
to the rise of the new technologies and the novel classes of devices and systems 
for which traditional design methods are not applicable. The surrogate-based 
approach and methods described in this chapter can make the electromagnetic- 
simulation-based design optimization feasible and cost efficient. In Chapter 12, a 
number of applications of the techniques presented here are demonstrated in the 
design of common microwave devices including filters, antennas and interconnect 
structures. 
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Chapter 9 

Variable-Fidelity Aerodynamic Shape Optimization 



Leifur Leifsson and Slawomir Koziel 



Abstract. Aerodynamic shape optimization (ASO) plays an important role in the 
design of aircraft, turbomachinery and other fluid machinery. Simulation-driven 
ASO involves the coupling of computational fluid dynamics (CFD) solvers with 
numerical optimization methods. Although being relatively mature and widely 
used, ASO is still being improved and numerous challenges remain. This chapter 
provides an overview of simulation-driven ASO methods, with an emphasis on 
surrogate-based optimization (SBO) techniques. In SBO, a computationally cheap 
surrogate model is used in lieu of an accurate high-fidelity CFD simulation in the 
optimization process. Here, a particular focus is given to SBO exploiting surrogate 
models constructed from corrected physics-based low-fidelity models, often 
referred to as variable- or multi-fidelity optimization. 

9.1 Introduction 

Aerodynamic and hydrodynamic design optimization is of primary importance in 
several disciplines [1-3]. In aircraft design, both for conventional transport aircraft 
and unmanned air vehicles, the aerodynamic wing shape is designed to provide 
maximum efficiency under a variety of takeoff, cruise, maneuver, loiter, and land- 
ing conditions [1, 4-7]. Constraints on aerodynamic noise are also becoming in- 
creasingly important [8, 9]. In the design of turbines, such as gas, steam, or wind 
turbines, the blades are designed to maximize energy output for a given working 
fluid and operating conditions [2, 10]. The shapes of the propeller blades of ships 
are optimized to increase efficiency [11]. The fundamental design problem, com- 
mon to all these disciplines, is to design a streamlined wing (or blade) shape that 
provides the desired performance for a given set of operating conditions, while at 
the same time fulfilling one or multiple design constraints [12-20]. 



Leifur Leifsson ■ Slawomir Koziel 

Engineering Optimization & Modeling Center, 

School of Science and Engineering, Reykjavik University, 

Menntavegur 1, 101 Reykjavik, Iceland 

email: {leifurth, koziel}@ru. is 



S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 179{£KJ] 
springerlink.com © Springer- Verlag Berlin Heidelberg 201 1 



180 L. Leifsson and S. Koziel 




Fig. 9.1 A CAD drawing of a typical transport aircraft with a turbofan jet engine. The air- 
craft wing and the turbine blades of the turbofan engines are streamlined aerodynamic 
surfaces defined by airfoil sections 

In the early days of engineering design, the designer would have to rely on ex- 
perience and physical experiments. Nowadays, most engineering design is per- 
formed using computational tools, especially in the early phases, i.e., conceptual 
and preliminary design. This is commonly referred to as simulation-driven (or si- 
mulation-based) design. Physical experiments are normally performed at the final 
design stages only, mostly for validation purposes. The fidelity of the computa- 
tional methods used in design has been steadily increasing. Over forty years ago, 
the computational fluid dynamic (CFD) tools were only capable of simulating po- 
tential flow past simplified wing configurations [1, 21]. Today's commercial CFD 
tools, e.g., [22, 23], are capable of simulating three-dimensional viscous flows 
past full aircraft configurations using the Reynolds-Averaged Navier-Stokes 
(RANS) equations with the appropriate turbulence models [24]. 

The use of optimization methods in the design process, either as a design sup- 
port tool or for automated design, has now become commonplace. In aircraft de- 
sign, the use of numerical optimization techniques began in the mid 1970's by 
coupling CFD tools with gradient-based optimization methods [1]. Substantial 
progress has been made since then, and the exploitation of higher-fidelity meth- 
ods, coupled with optimization techniques, has led to improved design efficiency 
[4, 12-16]. An overview of the relevant work is provided in the later sections. In 
spite of being widespread, simulation-driven aerodynamic design optimization 
involves numerous challenges: 

• High-fidelity CFD simulations are computationally expensive. A CFD si- 
mulation involves solving the governing flow equations on a computa- 
tional mesh. The resulting system of algebraic equations can be very large, 
with a number of unknowns equal to the product of the number of flow va- 
riables and the number of mesh points. For a three-dimensional turbulent 
RANS flow simulation with one million mesh points, and a two-equation 
turbulence model, there will be seven flow variables, leading to an alge- 
braic system of seven million equations with seven million unknowns. De- 
pending on the computational resources, this kind of simulation can take 
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many days on a parallel computer [25]. In the corresponding 
two-dimensional case, simulations on meshes with over one hundred thou- 
sand mesh points are not uncommon. A single simulation in this case can 
take over one hour on a typical desktop computer. 

• Design optimization normally requires a large number of simulations. For 
example, even in the case of a two-dimensional airfoil shape optimization 
with three design variables, a gradient-based optimization method can re- 
quire over one hundred function evaluations, and optimization process 
could take as long as one week [26]. For higher-dimensional problems, the 
required number function evaluations may be substantially larger. 

• A large number of design variables. Typically, an airfoil shape can be de- 
scribed accurately with, say, ten to fifteen design variables [27]. An entire 
transport wing shape might require a few (say three to seven) airfoils at 
various spanwise locations, leading to at least thirty design variables, aside 
from the planform variables (e.g., span, sweep, twist) [4]. 

• Multiple operating conditions [15] (e.g., a range of Mach numbers) and 
multiple objectives (e.g., minimum take-off gross weight, minimum drag, 
minimum noise) [8] may need to considered in the design process. 

• Uncertainty in the operating conditions and in the airfoil shape may need to 
be taken into account, leading to the need of carrying out stochastic analysis, 
which is always more time consuming than a deterministic one [28]. 

• The simulation results normally include numerical noise [29]. This can be 
due to partially converged solutions or due to badly generated computa- 
tional meshes. 



• 



• 



The objectives, e.g., the drag force, can be numerically sensitive to mesh 
resolution [30]. 

Coupling of the aerodynamics with other disciplines should be considered 
as well. For example, the coupling of the aerodynamic load with the wing 
structure by way of structural analysis [31]. This is referred to as multidis- 
ciplinary design and optimization (MDO). 

The above remarks indicate that high-fidelity CFD simulations are computation- 
ally far too expensive to be used in a direct, simulation-based design optimization, 
especially when using conventional, gradient-based techniques. Any further im- 
provement to the overall efficiency of the design process can only be achieved by 
developing more efficient optimization methods, i.e., reducing the number of 
high-fidelity CFD simulations required to yield an optimized design, and/or em- 
ploying more powerful computing resources. 

An important research area in the field of aerodynamic optimization is focused 
on employing the surrogate-based optimization (SBO) techniques [32, 33]. One of 
the major objectives is to reduce the number of high-fidelity model evaluations, 
and thereby making the optimization process more efficient. In SBO, the accurate 
but computationally expensive high-fidelity CFD simulations are replaced — in the 
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optimization process — by a cheap surrogate model. In this chapter, we provide a 
review of some representative works on aerodynamic shape optimization relating 
both to the direct [1, 4, 10, 12-21] and the surrogate-based optimization ap- 
proaches [29, 31-33]. In particular, the main emphasis of the chapter is on the 
SBO approach exploiting surrogate models constructed from corrected physics- 
based low-fidelity models [26, 34-38]. This is often referred to as variable- or 
multi-fidelity optimization. 

The chapter is organized as follows. In Section 9.2, we formulate the aerody- 
namic shape optimization problem using the example of airfoil design. The basics 
of the CFD modeling and simulation process are described in Section 9.3. Direct 
CFD-driven optimization is discussed in Section 9.4, whereas the SBO method- 
ologies are presented in Section 9.5. Section 9.6 concludes the chapter. 

9.2 Problem Formulation 

Aerodynamic design optimization includes a variety of specific problems ranging 
from two-dimensional airfoil shape optimization [13] to three-dimensional wing 
(or blade) design [4], involving one or several objective functions [19, 39], as well 
as one or multiple operating conditions [15, 27]. In this chapter, in order to high- 
light the formulation, design challenges and solution methodologies, we concen- 
trate on airfoil shape optimization for one representative operating condition, and 
with a single objective function. 

An airfoil is a streamlined aerodynamic surface such as the one shown in Fig. 9.2. 
The length of the airfoil is called the chord and is denoted by c. The thickness, de- 
noted by t, varies along the chord line. The curvature, called the camber, described by 
the mean camber line, varies also along the chord line. The leading-edge (LE) is 
normally rounded and the trailing-edge (TE) is normally sharp, either closed or open. 



Fig. 9.2 A single-element airfoil section (this solid line) of chord length c and thickness t. 
V„, is at an angle of attack a relative to the x-axis. F is the force acting on the airfoil due to 
the airflow, where the component perpendicular to V M is called the lift force I and the com- 
ponent parallel to V x is called the drag force d. p is the pressure acting normal to a surface 
element of length ds, and T is the viscous wall shear stress acting parallel to the surface 
element. 6 is the angle thatp and rmake relative to the z- and .v-axis, respectively, positive 
clockwise 
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The function of the airfoil is to generate a lift force I (a force component 
perpendicular to the free- stream) at a range of operating conditions (Mach number 
Moo, Reynolds number, angle of attack a). Normally, the drag force d (a force com- 
ponent parallel to the free-stream) is to be minimized. These forces are due to the 
pressure distribution p (acting normal to the surface) and the shear stress distribu- 
tion T (acting parallel to the surface) over the surface of the airfoil. A detailed de- 
scription of their calculation is given in Section 9.3.2.4. The forces are written in 
non-dimensional form. They are called the lift coefficient and the drag coefficient. 
The lift coefficient is defined as 



Q=— (9.D 



where / is the magnitude of the lift force, 1700 = (l/2)p^V^ 2 is the dynamic pressure, 

p x is the air density, V„ is the free-stream velocity, and S is a reference surface. 
For a two-dimensional airfoil, the reference area is taken to be the chord length 
multiplied by a unit depth, i.e., S = c. Similarly, the drag coefficient is defined as 

Q=^7 (9-2) 

<7ooi 

where d is the magnitude of the drag force. 

There are two main approaches to airfoil design. One is to design the airfoil 
section in order to maximize its performance. This is called direct design, and the 
most common design setups include lift maximization, drag minimization, and 
lift-to-drag ratio maximization [14]. Another way is to define a priori a specific 
flow behavior that is to be attained. The airfoil shape is then designed to achieve 
this flow behavior. This is called inverse design and, typically, a target airfoil 
surface pressure distribution is prescribed [40]. 

Table 9.1 Typical problem formulations for two-dimensional airfoil shape optimization. 
Additionally a constraint on the minimum allowable airfoil cross-sectional area is included. 



Case 




/(x) 




giOO 


Lift maximization 
Drag minimization 
L/D maximization 




-Q(x) 
Q(x) 

-Q(x)/Q(x) 




C d (x)-Cj im "<0 

C, hm " - C,(x) <0 

C/'' m,v -C z (x)<0 


Inverse design 


1/2} 


{C p (x)-Cf*> 


) ds 





In general, aerodynamic shape optimization can be formulated as a nonlinear 
minimization problem, i.e., for a given operating condition, solve 
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min/(x) 

X 

s.t. 8j (x)<0 

1<X<U 



(93) 



where /(x) is the objective function, x is the design variable vector (parameters de- 
scribing the airfoil shape), g/x) are the design constraints, and / and u are the low- 
er and upper bounds, respectively. The detailed formulation depends on the par- 
ticular design problem. Typical problem formulations for two-dimensional airfoil 
optimization are listed in Table 9.1. Additional constraints are often prescribed. 
For example, to account for the wing structural components inside the airfoil, one 
sets a constraint on the airfoil cross-sectional area, which can be formally written 
as g2(x) = A mi „ - A(x) < 0, where A(x) is the cross-sectional area of the airfoil for 
the design vector x and A,„„, is the minimum allowable cross-sectional area. Other 
constraints can be included depending on the design situation, e.g., a maximum 
pitching moment or a maximum local allowable pressure coefficient [41]. 

An aircraft wing and a turbomachinery blade are three-dimensional aerody- 
namic surfaces. A schematic of a typical wing (or a blade) planform is shown in 
Fig. 9.3, where — at each spanstation (numbered 1 through 4) — the wing cross- 
section is defined by an airfoil shape. The number of spanstations can be smaller 
or larger than four, depending on the design scenario. Between each station, there 
is a straight-line wrap. Parameters controlling the planform shape include the wing 
span, the quarter-chord wing sweep angle, the chord lengths and thickness-to- 
chord ratio at each spanstation, the wing taper ratio, and the twist distribution. 

Numerical design optimization of the three-dimensional wing (or blade) is per- 
formed in a similar fashion as for the two-dimensional airfoil [4]. In the problem for- 
mulation, the section lift and drag coefficients are replaced by the overall lift and drag 
coefficients. However, the number of design variables is much larger and the fluid 
flow domain is three-dimensional. These factors increase the computational burden, 
and the setup of the optimization process becomes even more important [4]. 




Fig. 9.3 A schematic of a wing planform of semi-span b/2 and quarter chord sweep angle 
A. Other planform parameters (not shown) are the taper ratio (ratio of tip chord to root 
chord) and the twist distribution. V& is the free-stream speed 
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The problem formulations presented above apply to a single operating 
condition (Mach number, Reynolds number, angle of attack) and a single objec- 
tive function. An airfoil optimized for a single operating point may have severe 
performance degradation for off-design points, or, in some cases, even a small de- 
viation from the design point could result in a dramatic change in the lift and drag 
coefficients [15, 19, 28]. 

Robust optimization is employed to improve the general wing/blade perform- 
ance at the optimal solution, in particular, to make it insensitive to small perturba- 
tions of the design variables or the operating conditions. In robust optimization, 
the objective is to achieve consistent performance improvement over a given 
range of uncertainty parameters. Example work on this subject include the airfoil 
optimization for a consistent drag reduction over a Mach number range [28, 39], 
also called multi-point optimization, and the aerodynamic design of a turbine 
blade airfoil shape taking into account the performance degradation due to 
manufacturing uncertainties [42] . 

In many cases, several (often competing) objectives may have to be considered 
at the same time. This is referred to as multi-objective design optimization. For 
example, during the take-off and landing of an aircraft, limits on the external noise 
are becoming increasingly important, and have to be accounted for in the design 
process [8, 9]. On the other hand, the search for robust airfoil designs 
can be treated as multi-objective optimization, i.e., maximizing robustness and 
performance simultaneously (since these are very likely conflicting objectives) 
[42]. 

9.3 Computational Fluid Dynamic Modeling 

This section presents a brief introduction to the elements of a CFD analysis. We 
introduce the governing fluid flow equations and explain the hierarchy of simpli- 
fied forms of the governing equations which are commonly used in aerodynamic 
design. The CFD process is then illustrated with an example two-dimensional 
simulation of the flow past an airfoil at transonic flow conditions. 

9.3.1 Governing Equations 

The fluid flow past an aerodynamic surface is governed by the Navier-Stokes equ- 
ations. For a Newtonian fluid, compressible viscous flows in two dimensions, 
without body forces, mass diffusion, finite-rate chemical reactions, heat conduc- 
tion, or external heat addition, the Navier-Stokes equations, can be written in 
Cartesian coordinates as [24] 

3U 3E 3F n 

— + — + — = (9.4) 

at ox ox 
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where U, E, and F are vectors given by 



U: 



P 

pu 
pv 

E , 



E = 



pu 

pu 2 +p-T xx 

pUV - T xy 

(E t + p)u-UT xx -VT x 



pv 



puv - 



(E t +p)v-uT -vt 



(9.5) 



Here, p is the fluid density, u and v are the x and y velocity components, respec- 
tively, p is the static pressure, E, = p (e+V 2 /2) is the total energy per unit volume, e 
is the internal energy per unit mass, V 2 /2 is the kinetic energy, and ris the viscous 
shear stress tensor given by [24] 
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du 



3v 
dy 
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■V 



du dv 

— + — 
dy dx 



(9.6) 



where p is the dynamic viscosity of the fluid. 

The first row of Eq. (9.6) corresponds to the continuity equation, the second 
and third rows are the momentum equations, and the fourth row is the energy 
equation. These four scalar equations contain five unknowns, namely (p, p, e, u, 
v). An equation of state is needed to close the system of equations. For most prob- 
lems in gas dynamics, it is possible to assume a perfect gas, which is defined as a 
gas whose intermolecular forces are negligible. A perfect gas obeys the perfect gas 
equation of state [24] 



P = pRT 



(9.7) 



where R is the gas constant. 

The governing equations are a set of coupled, highly nonlinear partial differen- 
tial equations. The numerical solution of these equations is quite challenging. 
What complicates things even further is that all flows will become turbulent above 
a critical value of the Reynolds number Re = VL/v, where V and L are representa- 
tive values of velocity and length scales and 1) is the kinematical viscosity. Turbu- 
lent flows are characterized by the appearance of statistical fluctuations of all the 
variables (p, p, e, u, v) around mean values. 

By making appropriate assumptions about the fluid flow, the governing equa- 
tions can be simplified and their numerical solution becomes computationally less 
expensive. In general, there are two approaches that differ in either neglecting the 
effects of viscosity or including them into the analysis. The hierarchy of the gov- 
erning flow equations depending on the assumptions made about the fluid flow 
situation is shown in Fig. 9.4. 
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Fig. 9.4 A hierarchy of the governing fluid flow equations with the associated assumptions 
and approximations 

Direct Numerical Simulation (DNS) has as objective to simulate the whole range 
of the turbulent statistical fluctuations at all relevant physical scales. This is a for- 
midable challenge, which grows with increasing Reynolds number as the total 
computational effort for DNS simulations is proportional to Re 3 for homogeneous 
turbulence [25]. Due to limitations of computational capabilities, DNS is not avail- 
able for typical engineering flows such as those encountered in airfoil design for 
typical aircraft and turbomachinery, i.e., with Reynolds numbers from 10 5 to 10 7 . 

Large-Eddy Simulation (LES) is of the same category as DNS, in that it com- 
putes directly the turbulent fluctuations in space and time, but only above a certain 
length scale. Below that scale, the turbulence is modeled by semi-empirical laws. 
The total computational effort for LES simulations is proportional to Re 9 ' 4 , which 
is significantly lower than for DNS [25]. However, it is still excessively high for 
large Reynolds number applications. 

The Reynolds equations (also called the Reynolds-averaged Navier-Stokes eq- 
uations (RANS)) are obtained by time-averaging of a turbulent quantity into their 
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mean and fluctuating components. This means that turbulence is treated through 
turbulence models. As a result, a loss in accuracy is introduced since the available 
turbulence models are not universal. A widely used turbulence model for simula- 
tion of the flow past airfoils and wings is the Spalart-Allmaras one-equation turbu- 
lence model [43]. The model was developed for aerospace applications and is con- 
sidered to be accurate for attached wall-bounded flows and flows with mild 
separation and recirculation. However, the RANS approach retains the viscous ef- 
fects in the fluid flow, and, at the same time, significantly reduces the computa- 
tional effort since there is no need to resolve all the turbulent scales (as it is done 
in DNS and partially in LES). This approach is currently the most widely applied 
approximation in the CFD practice and can be applied to both low-speed, such as 
take-off and landing conditions of an aircraft, and high-speed design [25]. 

The inviscid flow assumption will lead to the Euler equations. These equations 
hold, in the absence of separation and other strong viscous effects, for any shape 
of the body, thick or thin, and at any angle of attack [44]. Shock waves appear in 
transonic flow where the flow goes from being supersonic to subsonic. Across the 
shock, there is almost a discontinuous increase in pressure, temperature, density, 
and entropy, but a decrease in Mach number (from supersonic to subsonic). The 
shock is termed weak if the change in pressure is small, and strong if the change in 
pressure is large. The entropy change is of third order in terms of shock strength. 
If the shocks are weak, the entropy change across shocks is small, and the flow 
can be assumed to be isentropic. This, in turn, allows for the assumption of irrota- 
tional flow. Then, the Euler equations cascade to a single nonlinear partial differ- 
ential equation, called the full potential equation (FPE). In the case of a slender 
body at a small angle of attack, we can make the assumption of a small distur- 
bance. Then, the FPE becomes the transonic small-disturbance equation (TSDE). 
These three different sets of equations, i.e., the Euler equations, FPE, and TSDE, 
represent a hierarchy of models for the analysis of inviscid, transonic flow past 
airfoils [44]. The Euler equations are exact, while FPE is an approximation (weak 
shocks) to those equations, and TSDE is a further approximation (thin airfoils at 
small angle of attack). These approaches can be applied effectively for high-speed 
design, such as the cruise design of transport aircraft wings [13, 14] and the design 
of turbomachinery blades [2]. 

There are numerous airfoil and wing models that are not typical CFD models, 
but they are nevertheless widely used in aerodynamic design. Examples of such 
methods include thin airfoil theory, lifting line theory (unswept wings), vortex lat- 
tice methods (wings), and panel methods (airfoils and wings). These methods are 
out of the scope of this chapter, but the interested reader is directed to [45] and 
[46] for the details. In the following section, we describe the elements of a typical 
CFD simulation of the RANS or Euler equations. 

9.3.2 Numerical Modeling 

In general, a single CFD simulation is composed of four steps, as shown in 
Fig. 9.5: the geometry generation, meshing of the solution domain, numerical so- 
lution of the governing fluid flow equations, and post-processing of the flow 



Variable-Fidelity Aerodynamic Shape Optimization 189 

results, which involves, in the case of numerical optimization, calculating the 
objectives and constraints. We discuss each step of the CFD process and illustrate 
it by giving an example two-dimensional simulation of the flow past the NACA 
2412 airfoil at transonic flow conditions. 

9.3.2.1 Geometry 

Several methods are available for describing the airfoil shape numerically, each 
with its own benefits and drawbacks. In general, these methods are based on two 
different approaches, either the airfoil shape itself is parameterized, or, given an 
initial airfoil shape, the shape deformation is parameterized. 
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Fig. 9.5 Elements of a single CFD simulation in numerical airfoil shape optimization 

The shape deformation approach is usually performed in two steps. First, the 
surface of the airfoil is deformed by adding values computed from certain 
functions to the upper and lower sides of the surfaces. Several different types of 
functions can be considered, such as the Hicks-Henne bump functions [1], or the 
transformed cosine functions [47]. After deforming the airfoil surface, the compu- 
tational grid needs to be regenerated. Either the whole grid is regenerated based on 
the airfoil shape deformation, or the grid is deformed locally, accounting for the 
airfoil shape deformation. The latter is computationally more efficient. An exam- 
ple grid deformation method is the volume spline method [47]. In some cases, the 
first step described here above is skipped, and the grid points on the airfoil surface 
are used directly for the shape deformation [14]. 

Numerous airfoil shape parameterization methods have been developed. The 
earliest development of parameterized airfoil sections was performed by the Na- 
tional Advisory Committee for Aeronautics (NACA) in the 1930's [48]. Their de- 
velopment was derived from wind tunnel experiments, and, therefore, the shapes 
generated by this method are limited to those investigations. However, only three 
parameters are required to describe their shape. Nowadays, the most widely used 
airfoil shape parameterization methods are the Non-Uniform Rational B-Spline 
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(NURBS) [27], and the Bezier curves [49] (a special case of NURBS). These me- 
thods use a set of control points to define the airfoil shape and are general enough 
so that (nearly) any airfoil shape can be generated. In numerical optimization, 
these control points are used as design variables and they provide sufficient con- 
trol of the shape so that local changes on the upper and lower surfaces can be 
made separately. The number of control points varies depending on how accu- 
rately the shape is to be controlled. NURBS requires as few as thirteen control 
points to represent a large family of airfoils [27]. Other parameterization methods 
include the PARS EC method [50], which uses 11 specific airfoil geometry pa- 
rameters (such as leading edge radius, and upper and lower crest location includ- 
ing curvature), and the Bezier-PARSEC method [51], which combines the Bezier 
and PARSEC methods. 

In this chapter, for the sake of simplicity, we use the NACA airfoil shapes [48] 
to illustrate some variable-fidelity optimization methods. In particular, we use the 
NACA four-digit airfoil parameterization method, where the airfoil shape is de- 
fined by three parameters m (the maximum ordinate of the mean camberline as a 
fraction of chord), p (the chordwise position of the maximum ordinate) and tic (the 
thickness-to-chord ratio). The airfoils are denoted by NACA mpxx, where xx is the 
thickness-to-chord ratio, tic. 

The NACA airfoils are constructed by combining a thickness function y,(x) 
with a mean camber line function y c (x). The x-coordinates are [48] 

x„,i =x+y,sin0 (9.8) 

and the y-coordinates are 

y uJ =y c ±y t cos6 (9.9) 

where u and / refer to the upper and lower surfaces, respectively, y,(x) is the thick- 
ness function, y c (x) is the mean camber line function, and 

d = xm l (^\ (9.10) 

is the mean camber line slope. The NACA four-digit thickness distribution is given 
by 

y, =t(a Q x' -a x x-a 2 x + a 3 x —a 4 x ) (9.11) 

where a = 1.4845, a\ = 0.6300, a 2 = 1.7580, a 3 = 1.4215, a 4 = 0.5075, and t is the 
maximum thickness. The mean camber line is given by 

— =-(2px — x ),x < p 

p (9.12) 
r-(l — 2p + 2px — x ) , x > p 

Si-P) 2 

Three example NACA four-digit airfoils are shown in Fig. 9.6. 
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Fig. 9.6 Shown are three different NACA four-digit airfoil sections. NACA 0012 (m = 0, p 
= 0, tic = 0.12) is shown by a solid line (-). NACA 2412 (m = 0.02, p = 0.4, tic = 0.12) is 
shown by a dash line (-). NACA 4608 (m = 0.04, p = 0.6, tic = 0.08) is shown by a dash- 
dot line (— ) 

9.3.2.2 Computational Grid 

The governing equations are solved on a computational grid. The grid needs to re- 
solve the entire solution domain, as well as the detailed airfoil geometry. Further- 
more, the grid needs to be sufficiently fine to capture the flow physics accurately. 
For example, a fine grid resolution is necessary near the airfoil surface, especially 
near the LE where the flow gradients are large. Also, if viscous effects are in- 
cluded, then the grid needs to be fine near the entire airfoil surface (and any other 
wall surface in the solution domain). The grid can be much coarser several chord 
lengths away from the airfoil and in the farfield. For a detailed discussion on grid 
generation the reader is referred to [24] and [25]. 

For illustration purposes, a typical grid for an airfoil used in aircraft design, 
generated using the computer program ICEM CFD [52], is shown in Fig. 9.7. This 
is a structured curvilinear body-fitted grid of C-topology (a topology that can be 
associated to the letter C, i.e., at the inlet the grid surrounds the leading-edge of 
the airfoil, but is open at the other end). The size of the computational region is 
made large enough so that it will not affect the flow solution. In this case, there are 
24 chord lengths in front of the airfoil, 50 chord lengths behind it, and 25 chord 
lengths above and below it. The airfoil leading-edge (LE) is located at the origin. 

9.3.2.3 Flow Solution 

Most commercially available CFD flow solvers are based on the Finite Volume 
Method (FVM). According to FVM, the solution domain is subdivided into a fi- 
nite number of small control volumes (cells) by a grid. The grid defines the boun- 
daries of the control volumes, while the computational node lies at the center of 
the control volume. Integral conservation of mass, momentum, and energy are sat- 
isfied exactly over each control volume. The result is a set of linear algebraic eq- 
uations, one for each control volume. The set of equations are then solved itera- 
tively, or simultaneously. Iterative solution is usually performed with relaxation to 
suppress numerical oscillations in the flow solution that result from numerical er- 
rors. The iterative process is repeated until the change in the flow variables in two 
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subsequent iterations becomes smaller than the prescribed convergence threshold. 
Further reading on the FVM and solution procedures can be found in [24, 25]. 

The iterative convergence is normally examined by monitoring the overall re- 
sidual, which is the sum (over all the cells in the computational domain) of the L 2 
norm of all the governing equations solved in each cell. Moreover, the lift and 
drag forces coefficients are monitored for convergence, since these are the figures 
interest in airfoil shape optimization. 
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Fig. 9.7 (a) An example computational grid for the NACA 0012 airfoil with a C-topology, 
(b) a view of the computational grid close to the airfoil 



As an illustration, we consider the FVM-based computer code FLUENT [22] 
for the fluid flow simulations. Compressible inviscid flow past the NACA 2412 
airfoil at Mach number M„ = 0.75 and an angle of attack a = 1 degree is simulated 
using the Euler equations and a similar grid as shown in Fig. 9.7. The convergence 
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of the residuals for mass, momentum, and energy is shown in Fig. 9.8(a) and the 
convergence of the lift and drag coefficients is shown in Fig. 9.8(b). The limit on 
the residuals to indicate convergence was set to 10~ 6 . The solver needed 216 itera- 
tions to reach full convergence of the flow solution. However, only about 50 itera- 
tions or so are necessary to reach convergence of the lift and drag coefficient. The 
Mach number contour plot of the flow field around the airfoil is shown in 
Fig. 9.9(a) and the pressure distribution on the airfoil surface is shown in 
Fig. 9.9(b). On the upper surface there is a strong shock with associated decrease 
in flow speed and an increase in pressure. 
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Fig. 9.8 (a) Convergence history of the simulation of the flow past the NACA 2412 at M„ = 
0.75 and a = 1 deg., (b) convergence of the lift and drag coefficients. The converged values 
of the lift coefficient is C; = 0.67 and the drag coefficient is Q = 0.0261 
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9.3.2.4 Aerodynamic Forces 

The aerodynamic forces are calculated by integrating the pressure (p) and the vis- 
cous wall shear stress (f), as defined in Figure 9.2, over the surface of the airfoil. 
The pressure coefficient is defined as 
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Fig. 9.9 (a) Mach contour plot of the flow past the NACA 2412 at M„ = 0.75 and a = 1 
deg., (b) the pressure distribution on the surface of the airfoil. The lift coefficient is C/ = 
0.67 and drag coefficient Cj = 0.0261 
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where p x is the free-stream pressure. Similarly, the shear stress coefficient is 
defined as 

C/=— (9.14) 

The normal force coefficient (parallel to the z-axis) acting on the airfoil is [46] 

C n =-\{C Pu cos0+C fu sm9)ds u +\{C pi cos8 + C fi sind)ds t (9.15) 

where ds is the length of a surface element, (9 is the angle (positive clockwise) that 
p and rmake relative to the z- and x-axis, respectively. The subscripts u and / refer 
to the upper and lower airfoil surfaces, respectively. The horizontal force coeffi- 
cient (parallel to the x-axis) acting on the airfoil is [46] 

C a = j" {-C Pu sin 6 + C h cos 0)ds u + \(C Pi sin 6 + C fi cos d)ds, (9.16) 

The lift force coefficient is calculated as 

C, = C n cos a - C a sin a (9.17) 

where a is the airfoil angle of attack, and the drag force coefficient is calculated as 

C d = C„sina + C a cos a (9.18) 

9.4 Direct Optimization 

The direct optimization is understood here as employing the high-fidelity simula- 
tion model directly in the optimization loop. The flow of the direct optimization 
process is shown in Fig. 9.10 and can be described as follows. First, an initial de- 
sign x (0> is generated and the high-fidelity CFD simulation model is evaluated at 
that design, yielding values of the objective function and the constraints. Then, the 
optimization algorithm finds a new airfoil design, x, and the high-fidelity CFD 
simulation model is evaluated at that design and the objective and constraints are 
recalculated. Based on the improvement or deterioration in the objective function 
and the values of the constraints (fulfilled, critical, or violated), either the opti- 
mizer finds another design to evaluate, or it uses the current design for the ;th de- 
sign iteration to yield design x (,) . The high-fidelity model can be evaluated several 
times during one design iteration. Now, this loop is repeated until a termination 
condition is met and an optimized design has been reached. The termination con- 
dition could be, for example, based on the change in airfoil shape between two ad- 
jacent design iterations x (,) and x (,+1) . 
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Fig. 9.10 Flowchart of the direct optimization process 



9.4.1 Gradient-Based Methods 



The development of numerical optimization techniques pertaining to aircraft de- 
sign began when Hicks and Henne [1] coupled a gradient-based optimization algo- 
rithm with CFD codes to design airfoils and wings at both subsonic and transonic 
conditions. Substantial progress in gradient-based methods for aerodynamic de- 
sign has been made since then. Jameson [12] introduced control theory and con- 
tinuous adjoint methods to the optimal aerodynamic design for two-dimensional 
airfoils and three-dimensional wings, first using inviscid flow solvers [13, 14], and 
later using viscous flow solvers [4, 16]. The adjoint method is gradient-based, but 
it is very efficient since the computational expense incurred in the calculation of 
the gradient is effectively independent of the number of design variables. 

Eyi et al. [17] apply gradient-based optimization to the design of multi-element 
airfoils at high-lift conditions where the necessary gradients are obtained by finite- 
difference methods. Nemec and Zingg apply a gradient-based Newton-Krylov algo- 
rithm to high-lift system design [18], as well as transonic wing design [19], where the 
gradient of the objective function is computed using the discrete adjoint approach. 

Papadimitriou and Giannakoglou [10] apply continuous and discrete adjoint 
methods for the design optimization of a two-dimensional compressor and turbine 
cascades. They consider various problem formulations, such as inverse design and 
viscous losses minimization. 

Gradient-based methods are robust for local search. However, often a large 
number of function evaluations are needed, and since CFD simulations can be 
very expensive, the overall computational cost becomes prohibitive. Furthermore, 
results from CFD simulations include numerical noise, which is a serious issue for 
gradient-based algorithms. 
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9.4.2 Derivative-Free Methods 

Derivative-free approaches can be divided into two categories, local and global 
search methods. The local search methods include the pattern-search algorithm 
[53] and the Nelder-Mead algorithm [54]. Global search methods include Genetic 
Algorithms (GAs) [55], Evolutionary Algorithms (EAs) [56], Particle Swarm Op- 
timization (PSO) [57, 58], and Differential Evolution (DE) [59], all of which are 
often referred to as meta-heuristic algorithms. Example applications to aerody- 
namic shape optimization can be found, e.g., in [20], [49] and [51]. Another global 
search method is Simulated Annealing (SA) [60]. 

The main advantages of these methods are that they do not require gradient data 
and they can handle noisy/discontinuous objective functions. However, the afore- 
mentioned derivative-free methods normally require a large number of function 
evaluations. 

9.5 Surrogate-Based Optimization 

In this section, we provide a brief overview of surrogate-based optimization 
(SBO) [32, 33]. We begin with presenting the concept of SBO. Then, we discuss 
the construction of the surrogate model, and, finally, we present a few popular 
surrogate-based optimization techniques. 

9.5.1 The Concept 

In many situations, the functions one wants to optimize are difficult to handle. 
This is particularly the case in aerodynamic design where the objective and con- 
straint functions are typically based on CFD simulations. The major issue is com- 
putational cost of simulation which may be very high (e.g., up to several days or 
even weeks for high-fidelity 3D wing simulation for a single design). Another 
problem is numerical noise which is always present in CFD tools. Also, simulation 
may fail for specific sets of design variables (e.g., due to convergence issues). In 
order to alleviate these problems, it is often advantageous to replace — in the opti- 
mization process — the original objective function by its surrogate model. To make 
this replacement successful, the surrogate should be sufficiently accurate represen- 
tation of the original function, yet analytically tractable (smooth), and, preferably 
computationally cheap (so that to reduce the overall cost of the optimization proc- 
ess). In practice, surrogate-based optimization if often an iterative process, where 
the surrogate model is re-optimized and updated using the data from the original 
function that is accumulated during the algorithm run. 

The flow of a typical SBO algorithm is shown in Fig. 9.11. The surrogate 
model is optimized, in place of the high-fidelity one, to yield prediction of its mi- 
nimizer. This prediction is verified by evaluating the high-fidelity model, which is 
typically done only once per iteration (at every new design x (,+1) ). Depending on 
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the result of this verification, the optimization process may be terminated or may 
continue, in which case the surrogate model is updated using the new available 
high-fidelity model data, and then re-optimized to obtain a new, and hopefully bet- 
ter approximation of the minimizer. For a well performing surrogate-based algo- 
rithm, the number of iterations is substantially smaller than for most methods op- 
timizing the high-fidelity model directly (e.g., gradient-based one). 
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Fig. 9.11 A flowchart of a typical surrogate-based optimization algorithm 

9.5.2 Surrogate Modeling 

The surrogates can be created either by approximating the sampled high-fidelity 
model data using regression (so-called function-approximation surrogates or func- 
tional surrogates), or by correcting physics-based low-fidelity models, which are 
less accurate but computationally cheap representations of the high-fidelity ones 
[33]. 



9.5.2.1 Functional Surrogates 

Functional surrogate models are constructed without any particular knowledge of 
the physical system. The construction process can be summarized as follows: 

1. Design of Experiments (DoE): Allocate a set of points in the design space 
by using a specific strategy to maximize the amount of information gained 
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from a limited number of samples [61]. Factorial designs, are classical 
DoE techniques, and these techniques typically spread the samples apart 
as much as possible to reduce any random error (which is important when 
obtaining data from physical experiments) [62]. Nowadays, space filling 
designs are commonly used with Latin Hypercube Sampling being 
probably the most popular one [61]. 

2. Acquire data: Evaluate the high-fidelity model at the points specified in 
step 1. 

3. Model Selection and Identification: Choose the surrogate model and de- 
termine its parameters. A number of models are available including poly- 
nomial regression [32], radial basis functions [33], kriging [62], neural 
networks [62] and support vector regression [63]. 

4. Model validation: Estimate the model generalization error. The most pop- 
ular method is cross-validation [32], where the data is split into k subsets, 
and a surrogate model is constructed k times so that k-\ subsets are used 
for training, and the remaining ones are used to calculate the generaliza- 
tion error, averaged over all k combinations of training/testing data. 

The functional surrogate models are typically cheap to evaluate. However, a con- 
siderable amount of data is required to set up the surrogate model that ensures rea- 
sonable accuracy. The methodology of constructing the functional surrogates is 
generic, and, therefore, is applicable to a wide class of problems. 

9.5.2.2 Physics-Based Surrogates 

The physics-based surrogates are constructed by correcting an underlying 
low-fidelity model, which can be based on one of, or a combination of the 
following: 

• Simplified physics: Replace the set of governing fluid flow equations by a 
set of simplified equations, e.g., using the Euler equations in place of the 
RANS equations [35]. These are often referred to as variable-fidelity 
physics models. 

• Coarse discretization: Use the same fluid flow model as in the high- 
fidelity model, but with a coarser computational mesh discretization [36]. 
Often referred to as variable-resolution models. 

• Relaxed convergence criteria: Reduce the number of maximum allowable 
iterations and/or reduce the convergence tolerance [63]. Sometimes 
referred to as variable-accuracy models. 

As the low-fidelity model enjoys the same underlying physics as the high-fidelity 
one, it is able to predict the general behavior of the high-fidelity model. However, 
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the low-fidelity model needs to be corrected to match the sampled data of the 
high-fidelity model to become a reliable and accurate predictor. Popular correction 
techniques include response correction [34] and space mapping [65]. One of the 
recent techniques is shape-preserving response prediction (SPRP) introduced in 
[26]. The application of this technique to the design of airfoil at high-lift and 
transonic conditions is given in the next chapter. 

The physics-based surrogates are typically more expensive to evaluate than the 
functional surrogates. Furthermore, they are problem specific, i.e., reuse across 
different problems is rare. On the other hand, their fundamental advantage is that 
much less high-fidelity model data is needed to obtain a given accuracy level than 
in case of functional surrogates. Some SBO algorithms exploiting physics-based 
low-fidelity models (often referred to as variable- or multi-fidelity ones) require 
just a single high-fidelity model evaluation per algorithm iteration to construct the 
surrogate [34, 38]. One of the consequences is that the variable-fidelity SBO me- 
thods are more scalable to larger number of design variables (assuming that no 
derivative information is required) than SBO using functional surrogates. 

9.5.3 Optimization Techniques 

This section presents selected optimization techniques that employ physics-based 
low-fidelity surrogate models. We describe the Approximation Model Manage- 
ment Optimization (AMMO) algorithm [34-36] and the Surrogate Management 
Framework (SMF) [38]. We also briefly mention a few other techniques. 

9.5.3.1 Approximation Model Management Optimization (AMMO) 

The AMMO algorithm is a general approach for controlling the use of variable- 
fidelity models when solving a nonlinear minimization problem, such as Eq. (9.3), 
[34-36]. A flowchart of the AMMO algorithm is shown in Fig. 9.12. The opti- 
mizer receives the function and constraint values, as well as their sensitivities, 
from the low-fidelity model. The response of the low-fidelity model is corrected to 
satisfy at least zero- and first-order consistency conditions with the high-fidelity 
model, i.e., agreement between the function values and the first-order derivatives 
at a given iteration point. The expensive high-fidelity computations are performed 
outside the optimization loop and serve to re-calibrate the low-fidelity model oc- 
casionally, based on a set of systematic criteria. AMMO exploits the trust-region 
methodology [66], which is an adaptive move limit strategy for improving the 
global behavior of optimization algorithms based on local models. By combining 
the trust-region approach with the use of the low-fidelity model satisfying at least 
first-order consistency conditions, then convergence of AMMO to the optimum of 
the high-fidelity model can be guaranteed. 
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Fig. 9.12 A flowchart of the Approximation Model Management Optimization (AMMO) 
algorithm 

The AMMO methodology has been applied to various aerodynamic design 
problems [34-36]. In [36], AMMO is applied to both 2D airfoil shape optimization 
and 3D aerodynamic wing optimization, both at transonic operating conditions. 
The Euler equations are used as governing fluid flow equations for the both the 
high- and low-fidelity models at variable grid resolution, i.e., a fine grid for the 
high-fidelity model and a coarse grid for the low-fidelity model. The results 
showed a threefold improvement in the computational cost in the 3D wing design 
problem, when compared to direct optimization of the high-fidelity model, and a 
twofold improvement for the 2D airfoil design problem. In [35], AMMO is ap- 
plied to 2D airfoil design at transonic conditions using the RANS equations (rep- 
resenting viscous flow past the airfoil) as the high-fidelity model and the Euler 
equations (representing inviscid flow past the airfoil) for the low-fidelity model. 
The high-fidelity model is solved on a much finer mesh than the low-fidelity mod- 
el. The results demonstrated a fivefold improvement when compared to direct 
optimization of the high-fidelity model. 

First-order consistency in variable-fidelity SBO can be insufficient to achieve 
acceptable convergence rates, which can be similar to those achieved by first- 
order optimization methods, such as steepest-descent or sequential linear pro- 
gramming [67]. More successful optimization methods, such as sequential quad- 
ratic programming, use at least approximate second-order information to achieve 
super-linear or quadratic convergence rates in the neighborhood of the minimum. 
Eldred et al. [68] present second-order corrections methods for variable-fidelity 
SBO algorithms. The second-order corrections enforce consistency with the 
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high-fidelity model Hessian. However, since full second-order information is not 
commonly available in practical engineering problems, consistency can also be 
enforced to an approximation using finite difference, quasi-Newton, or Gauss- 
Newton to the high-fidelity Hessian. The results show that all of these approaches 
outperform the first-order corrections. Then again, the second-order corrections 
come at a price, since additional function evaluations are required. Additionally, 
they can become impractical for large design problems, unless adjoint-based 
gradients are employed. Finally, the issue of how numerical noise affects the 
second-order corrected SBO process has not been addressed. 

9.5.3.2 Surrogate Management Framework (SMF) 

The Surrogate Management Framework (SMF) algorithm [38] is a mesh-based 
technique that uses the surrogate model as a predictive tool, while retaining the 
robust convergence properties of pattern-search methods for a local grid search. 

The SMF algorithm (Fig. 9.13) consists of the two steps, SEARCH and POLL. In 
the SEARCH step, the surrogate model is used to identify the set of points likely to 
minimize the objective function. The SEARCH can explore the surrogate model glob- 
ally or locally. In any case, this step is not required for the algorithm convergence. 




Fig. 9.13 A flowchart of the Surrogate Management Framework (SMF) 
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The convergence of the SMF is ensured by the POLL step, where the neighbors 
of the current best solution are evaluated using the high-fidelity model on the 
mesh in a positive spanning set of directions [69] to look for a local objective 
function improvement. In case the POLL step fails to improve the objective func- 
tion value, the mesh is being refined and the new iteration begins starting with the 
SEARCH step. 

The surrogate model is updated in each iteration using all accumulated 
high-fidelity data. 

In [69], the SMF algorithm is applied to the optimal aeroacoustic shape design 
of an airfoil in laminar flow. The airfoil shape is designed to minimized total radi- 
ated acoustic power while constraining lift and drag. The high-fidelity model is 
implemented through the solution to the unsteady incompressible two-dimensional 
Navier-Stokes equations with a roughly 24 hour analysis time for a single CFD 
evaluation. The surrogate function is constructed using kriging, which is typical 
when using the SMF algorithm. As the acoustic noise is generated at the airfoil 
TE, only the upper TE of the airfoil is parameterized with a spline using five con- 
trol points. Optimal shapes that minimize noise are reported. Results show a 
significant reduction (as much as 80%) in acoustic power with reasonable 
computational cost (less than 88 function evaluations). 

9.5.3.3 Other Techniques 

Robinson et al. [70-72] presented a provably convergent trust-region model- 
management (TRMM) methodology for variable-parameterization design models. 
This is an SBO method which uses a lower-fidelity model as a surrogate. How- 
ever, the low-fidelity design space has a lower dimension than the high-fidelity 
design space. The design variables of the low-fidelity model can be a subset of the 
high-fidelity model, or they can be different from the high-fidelity model. The 
mathematical relationship between the design vectors is described by space map- 
ping (SM) [65, 73-75]. Since SM does not provide provable convergence within a 
TRMM framework, but any surrogate that is first-order accurate does, they correct 
the space-mapping to be at least first-order, called corrected space-mapping. 

The TRMM method has been applied to several constrained design problems. 
One problem involves the design of a wing planform (minimize induce drag and 
constrain lift for a given wingspan) using a vortex-lattice method as the high- 
fidelity model and a lifting-line method as the low-fidelity model. The results 
indicated over 76% savings in high-fidelity function calls as compared to direct 
optimization. Another problem involves the design of a flapping-wing where an 
unsteady panel-method is used as the high-fidelity model and the low-fidelity 
model is based on thin-airfoil theory and is assumed to be quasi-steady. Approxi- 
mately 48% savings in high-fidelity function calls where demonstrated when 
compared to direct optimization. 

Several other optimization techniques are available that exploit a surrogate 
constructed from a physics-based low-fidelity model. In SM, the surrogate is a 
composition of the low-fidelity model and simple, usually linear, transformations 
that re-shape the model domain (input-like SM [65]), correct the model response 
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(output-like SM [74]) or change the overall model properties (implicit-like SM 
[74]) using additional variables that are not directly used in the optimization proc- 
ess (so-called pre-assigned parameters). Manifold mapping (MM) [76] is a special 
case of output-like SM that aims at enforcing first-order consistency conditions 
using exact or approximate high-fidelity model sensitivity data. These methods 
are, however, not (yet) popular in the case of aerodynamic shape optimization. 

Shape-preserving response prediction (SPRP) is a relatively novel technique 
which was introduced in the field of microwave engineering [77], but it has been 
recently applied to airfoil shape optimization [26, 78]. This technique is described 
in detail in the next chapter. 

9.5.3.4 Exploration versus Exploitation 

One of the important steps of the SBO optimization process is to update the surro- 
gate model using the high-fidelity model data accumulated during the algorithm 
run. In particular, the high-fidelity model is evaluated at any new design obtained 
from prediction provided by the surrogate model. The new points at which we 
evaluate the high-fidelity model are referred to as infill points [33]. Selection of 
these points is based on certain infill criteria. These criteria can be either exploita- 
tion- or exploration-based. 

A popular exploitation-based strategy is to select the surrogate minimizer as the 
new infill point [33]. This strategy is able to ensure finding at least a local mini- 
mum of the high-fidelity model provided that the surrogate model satisfies zero- 
and first-order consistency conditions. In general, using the surrogate model 
optimum as a validation point corresponds to exploitation of certain region of the 
design space, i.e., neighborhood of a local optimum. Selecting the surrogate mini- 
mizer as the infill point is utilized by AMMO [34-36], SM [65, 73-75], MM [76], 
and can also be used by SMF [38]. 

In exploration-based strategies, the new sample points are located in between 
the existing ones. This allows building a surrogate model that is globally accurate. 
A possible infill criterion is to allocate the new samples at the points of maximum 
estimated error [33]. Pure exploration however, may not be a good way of updat- 
ing the surrogate model in the context of optimization because the time spent on 
accurately modeling sub-optimal regions may be wasted if the global optimum is 
the only interest. 

Probably the best way of performing global search is to balance exploration and 
exploitation of the design space. The details regarding several possible approaches 
can be found in [33]. 

9.6 Summary 

Although aerodynamic shape optimization (ASO) is widely used in engineering 
design, there are numerous challenges involved. One of the biggest challenges is 
that high-fidelity computational fluid dynamic (CFD) simulations are (usually) 
computationally expensive. As a result, the overall computational cost of the 
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design optimization process becomes prohibitive since, typically, a large number 
of simulations are required. Therefore, it is impractical to employ the high-fidelity 
model directly in the optimization loop. 

One of the objectives of surrogate-based optimization (SBO) is to reduce the 
overall computational cost by replacing the high-fidelity model in the optimization 
loop by a cheap surrogate model. The surrogate models can be created by (9.1) 
approximating the sampled high-fidelity model data using regression (so-called 
function-approximation surrogates), or (9.2) by correcting physics-based low- 
fidelity models which are less accurate but computationally cheap representations 
of the high-fidelity models. 

A variety of techniques are available to create the function-approximation sur- 
rogate model, such as polynomial regression, and kriging. Function-approximation 
models are versatile, however, they normally require substantial amount of data 
samples to ensure good accuracy. The physics-based surrogates are constructed by 
correcting the underlying low-fidelity models, which can be obtained through 
(9.1) simplified physics models, (9.2) coarser discretization, and (9.3) relaxed con- 
vergence criteria. These models are typically more expensive to evaluate than the 
function-approximation surrogates, but less high-fidelity model data is needed to 
obtain a given accuracy level. 

The low-fidelity models needs to be corrected to become an accurate and reli- 
able representation of the high-fidelity model. Popular correction methods include 
response correction and space mapping. 

In SBO with physics-based low-fidelity models, called variable- or multi- 
fidelity SBO, only a single high-fidelity model evaluation is typically required per 
algorithm iteration. Due to this the variable-fidelity SBO method is naturally scal- 
able to larger numbers of design variables (assuming that no derivative informa- 
tion is required). 

The Approximation and Model Management Optimization (AMMO) is a ge- 
neric SBO approach. AMMO is based on ensuring zero- and first-order 
consistency conditions between the high-fidelity model and the surrogate by using 
a suitable correction term. This requires derivative information. Another SBO 
technique is the Surrogate Management Framework (SMF) algorithm. SMF is a 
mesh-based technique that uses the surrogate model as a predictive tool, while re- 
taining the robust convergence properties of pattern search methods for a local 
grid search. Typically, the surrogate model is constructed using kriging and the 
surrogate is updated in each iteration using all accumulated high-fidelity data. 
Convergence (at least to a local minimum) is ensured by the pattern search. 
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Chapter 10 
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Multi-Objective Aerodynamic Shape 

Optimization 

Alfredo Arias-Montafio, Carlos A. Coello Coello, and Efren Mezura-Montes 



Abstract. Optimization problems in many industrial applications are very hard to 
solve. Many examples of them can be found in the design of aeronautical systems. 
In this field, the designer is frequently faced with the problem of considering not 
only a single design objective, but several of them, i.e., the designer needs to solve 
a Multi-Objective Optimization Problem (MOP). In aeronautical systems design, 
aerodynamics plays a key role in aircraft design, as well as in the design of propul- 
sion system components, such as turbine engines. Thus, aerodynamic shape opti- 
mization is a crucial task, and has been extensively studied and developed. Multi- 
Objective Evolutionary Algorithms (MOEAs) have gained popularity in recent years 
as optimization methods in this area, mainly because of their simplicity, their ease of 
use and their suitability to be coupled to specialized numerical simulation tools. In 
this chapter, we will review some of the most relevant research on the use of MOEAs 
to solve multi-objective and/or multi-disciplinary aerodynamic shape optimization 
problems. In this review, we will highlight some of the benefits and drawbacks of 
the use of MOEAs, as compared to traditional design optimization methods. In the 
second part of the chapter, we will present a case study on the application of MOEAs 
for the solution of a multi-objective aerodynamic shape optimization problem. 
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10.1 Introduction 

There are many industrial areas in which optimization processes help to find new 
solutions and/or to increase the performance of an existing one. Thus, in many cases 
a research goal can be translated into an optimization problem. Optimal design in 
aeronautical engineering is, by nature, a multiobjective, multidisciplinary and highly 
difficult problem. Aerodynamics, structures, propulsion, acoustics, manufacturing 
and economics, are some of the disciplines involved in this type of problems. In 
fact, even if a single discipline is considered, many design problems in aeronautical 
engineering have conflicting objectives (e.g., to optimize a wing's lift and drag or 
a wing's structural strength and weight). The increasing demand for optimal and 
robust designs, driven by economics and environmental constraints, along with the 
advances in computational intelligence and the increasing computing power, has 
improved the role of computational simulations, from being just analysis tools to 
becoming design optimization tools. 

In spite of the fact that gradient-based numerical optimization methods have been 
successfully applied in a variety of aeronautical/aerospace design problems^ [30. 
[161112 their use is considered a challenge due to the following difficulties found in 
practice: 

1 . The design space is frequently multimodal and highly non-linear. 

2. Evaluating the objective function (performance) for the design candidates is usu- 
ally time consuming, due mainly to the high fidelity and high dimensionality 
required in the simulations. 

3. By themselves, single-discipline optimizations may provide solutions which not 
necessarily satisfy objectives and/or constraints considered in other disciplines. 

4. The complexity of the sensitivity analyses in Multidisciplinary Design Optimiza- 
tion (MDCq) increases as the number of disciplines involved becomes larger. 

5. In MDO, a trade-off solution, or a set of them, are searched for. 

Based on the previously indicated difficulties, designers have been motivated to 
use alternative optimization techniques such as Evolutionary Algorithms (EAs) 
BTl l20l |33 1 . Multi-Objective Evolutionary Algorithms (MOEAs) have gained an 
increasing popularity as numerical optimization tools in aeronautical and aerospace 
engineering during the last few years 0] [2T1 . These population-based methods 
mimic the evolution of species and the survival of the fittest, and compared to tradi- 
tional optimization techniques, they present the following advantages: 

(a) Robustness: In practice, they produce good approximations to optimal sets of 
solutions, even in problems with very large and complex design spaces, and are 
less prone to get trapped in local optima. 



1 It is worth noting that most of the applications using gradient-based methods have adopted 
them to find global optima or a single compromise solution for multi-objective problems. 

2 Multidisciplinary Design Optimization, by its nature, can be considered as a multi- 
objective optimization problem, where each discipline aims to optimize a particular per- 
formance metric. 
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(b) Multiple Solutions per Run: As MOEAs use a population of candidates, they are 
designed to generate multiple trade-off solutions in a single run. 

(c) Easy to Parallelize: The design candidates in a MOEA population, at each 
generation, can be evaluated in parallel using diverse paradigms. 

(d) Simplicity: MOEAs use only the objective function values for each design can- 
didate. They do not require a substantial modification or complex interfacing 
for using a CFD (Computational Fluid Dynamics) or CSD/M (Computational 
Structural Dynamics/Mechanics) code. 

(e) Easy to hybridize: Along with the simplicity previously stated, MOEAs also 
allow an easy hybridization with alternative methods, e.g., memetic algorithms, 
which additionally introduce specifities to the implementation, without 
influencing the MOEA simplicity. 

(f) Novel Solutions: In many cases, gradient-based optimization techniques con- 
verge to designs which have little variation even if produced with very different 
initial setups. In contrast, the inherent explorative capabilities of MOEAs allow 
them to produce, some times, novel and non-intuitive designs. 

An important volume of information has been published on the use of MOEAs in 
aeronautical engineering applications (mainly motivated by the advantages previ- 
ously addressed). In this chapter, we provide a review of some representative works, 
dealing specifically with multi-objective aerodynamic shape optimization. 

The remainder of this chapter is organized as follows: In Section [T0.2l we present 
some basic concepts and definitions adopted in multi-objective optimization. Next, 
in Section HO. 31 we review some of the work done in the area of multi-objective 
aerodynamic shape optimization. This review covers: surrogate based optimization, 
hybrid MOEA optimization, robust design optimization, multidisciplinary design op- 
timization, and data mining and knowledge extraction. In Section fl 0.41 we present 
a case study and, finally, in Section 110.51 we present our conclusions and final 
remarks. 

10.2 Basic Concepts 

A Multi-Objective Optimization Problem (MOP) can be mathematically defined as 
follow^: 

minimize f(x) := [f 1 (x),f 2 (x),...,f k (x)} (10.1) 

subject to: 

gi(x)<0 t=l,2,...,m (10.2) 

hj(x)=0 i=l,2,...,/> (10.3) 



Without loss of generality, minimization is assumed in the following definitions, since any 
maximization problem can be transformed into a minimization one. 
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where x = [jci,jc2, • • • ,x n ] is the vector of decision variables, which are bounded 
by lower (x\) and upper (x") limits which define the search space S?, ft : R" — > R, 
i = l,...,k are the objective functions and gi,hj : R" — ► R, i = 1, ...,m, / = 1, ...,/? 
are the constraint functions of the problem. 

In other words, we aim to determine from among the set & C S^ {J& is the feasi- 
ble region of the search space S*) of all vectors which satisfy the constraints, those 
that yield the optimum values for all the k objective functions, simultaneously. The 
set of constraints of the problem defines &, Any vector of variables x which satis- 
fies all the constraints is considered a feasible solution. In their original version, an 
EA (and also a MOEA) lacks a mechanism to deal with constrained search spaces. 
This has motivated a considerable amount of research regarding the design and im- 
plementation of constraint-handling techniques for both EAs and MOEAs fT0l|29l . 

10.2.1 Pareto Dominance 

Pareto dominance is an important component of the notion of optimality in MOPs 
and is formally defined as follows: 

Definition 1. A vector of decision variables x G R" dominates another vector of de- 
cision variables y G R", (denoted by x < y) if and only if x is partially less than y, 
i.e.ViG{l J ...,*},/i(x)</,(y)A3*G{l J ... J *}:/j(x)</i(y). 

Definition 2. A vector of decision variables x G SE C R" is nondominated with 
respect to JT, if there does not exist another x' 6 SE such that f(x') < f(x). 

In order to say that a solution dominates another one, it needs to be strictly better in 
at least one objective, and not worse in any of them. 

10.2.2 Pareto Optimality 

The formal definition of Pareto optimality is provided next: 

Definition 3. A vector of decision variables x* G & C S? c R" is Pareto optimal 
if it is nondominated with respect to j£ ' . 

In words, this definition says that x* is Pareto optimal if there exists no feasible vec- 
tor x which would decrease some objective without causing a simultaneous increase 
in at least one other objective (assuming minimization). This definition does not pro- 
vide us a single solution (in decision variable space), but a set of solutions which 
form the so-called Pareto Optimal Set (8?*), whose formal definition is given by: 

Definition 4. The Pareto optimal set 3?* is defined by: 

£?* = {x G J^|x is Pareto optimal} 
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The vectors that correspond to the solutions included in the Pareto optimal set are 
said to be nondominated. 

10.2.3 Pareto Front 

When all nondominated solutions are plotted in objective function space, the non- 
dominated vectors are collectively known as the Pareto front {3^^*). 

Definition 5. The Pareto front &>&* is defined by: 

&>jr* = {f(x) £-R k \x€ &*} 

The goal on a MOP consists on determining 3^* from & of all the decision variable 
vectors that satisfy ( 110.2b and ( 110.3b . Thus, when solving a MOP, we aim to find 
not one, but a set of solutions representing the best possible trade-offs among the 
objectives (the so-called Pareto optimal set). 

10.3 Multi-Objective Aerodynamic Shape Optimization 
10.3.1 Problem Definition 

Aerodynamics is the science that deals with the interactions of fluid flows and ob- 
jects. This interaction is governed by conservation laws which are mathematically 
expressed by means of the Navier-Stokes equations, which comprise a set of partial 
differential equations, being unsteady, nonlinear and coupled among them. Aero- 
dynamicists are interested in the effects of this interaction, in terms of their aero- 
dynamic forces and moments, which are the result of integrating the pressure and 
shear stresses distributions that the flow excerses over the object with which it is in- 
teracting. In its early days, aerodynamic designs were done by extensive use of ex- 
perimental facilities. Nowadays, the use of Computational Fluid Dynamics (CFD) 
technology to simulate the flow of complete aircraft configurations, has made it 
possible to obtain very impressive results with the help of high performance com- 
puters and fast numerical algorithms. At the same time, experimental verifications 
are carried out in scaled flight tests, avoiding many of the inherent disadvantages 
and extremely high costs of wind tunnel technology. Therefore, we can consider 
aerodynamics as a mature engineering science. 

Thus, current aerodynamic research focuses on finding new designs and/or im- 
proving current ones, by using numerical optimization techniques. In the case of 
multi-objective optimization, the objective functions are defined in terms of aero- 
dynamic coefficients and/or flow conditions. Additionally, design constraints are 
included to render the solutions practical or realizable in terms of manufacturing 
and/or operating conditions. Optimization is accomplished by means of a more or 
less systematic variation of the design variables which parameterize the shape to be 
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optimized. A variety of optimization algorithms, ranging from gradient-based 
methods to stochastic approaches with highly sophisticated schemes for the adap- 
tation of the individual mutation step sizes, are currently available. From them, 
MOEAs have been found to be a powerful but easy-to-use choice. Next, we will 
briefly review some of the most representative works on the use of MOEAs for 
aerodynamic design. The review comprises the following dimensions that are iden- 
tified as the most relevant, from a practical point of view, for the purposes of this 
chapter: 

• Surrogate-based optimization, 

• Hybrid MOEA optimization, 

• Robust design optimization, 

• Multidisciplinary design-optimization, and 

• Data-mining and knowledge extraction. 

10.3.2 Surrogate-Based Optimization 

Evolutionary algorithms, being population-based algorithms, often require popula- 
tion sizes, and a number of evolution steps (generations) that might demand tremen- 
dous amounts of computing resources. Examples of these conditions are presented 
by Benini [4|, who reported computational times of 2000 hrs. in the multi-objective 
re-design of a transonic turbine rotor blade, using a population with 20 design can- 
didates, and 100 generations of evolution time, in a four-processors workstation. 
Thus, when expensive function evaluations are required, the required CPU time may 
turn prohibitive the application of MOEAs, even with today's available computing 
power. 

For tackling the above problem, one common technique adopted in the field of 
aerodynamic shape optimization problems, is the use of surrogate models. These 
models are built to approximate computationally expensive functions. The main 
objective in constructing these models is to provide a reasonably accurate ap- 
proximation to the real functions, while reducing by several orders of magnitude 
the computational cost. Surrogate models range form Response Surface Methods 
(RSM) based on low-order polynomial functions, Gaussian processes or Kriging, 
Radial Basis Funcions (RBFs), Artificial Neural Networks (ANNs), to Support Vec- 
tor Machines (SVMs). A detailed description of each of these techniques is be- 
yond the scope of this chapter, but the interested reader is referred to Jin fT9l for a 
comprehensive review of these and other approximation techniques. 

In the context of aerodynamic shape optimization problems, some researchers 
have used surrogates models to reduce the computational time used in the optimiza- 
tion process. The following is a review of some representative research that has been 
conducted in this area: 



• Lian and Liou 11261 addressed the multi-objective optimization of a three-dimen- 
sional rotor blade, namely the redesign of the NASA rotor 67 compressor blade, 
a transonic axial-flow fan rotor. Two objectives were considered in this case: 
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(i) maximization of the stage pressure rise, and (ii) minimization of the entropy 
generation. Constraints were imposed on the mass flow rate to have a difference 
less than 0.1% between the new one and the reference design. The blade ge- 
ometry was constructed from airfoil shapes defined at four span stations, with a 
total of 32 design variables. The authors adopted a MOEA based on MOGA |[T4l 
with real numbers encoding. The optimization process was coupled to a second- 
order RSM, which was built with 1 ,024 design candidates using the Improved 
Hypercube Sampling (IHS) algorithm. The authors reported that the evaluation 
of the 1,024 sampling individuals took approximately 128 hours (5.33 days) us- 
ing eight processors and a Reynolds-Averaged Navier-Stokes CFD simulation. In 
their experiments, 12 design solutions were selected from the RSM-Pareto front 
obtained, and such solutions were verified with a high fidelity CFD simulation. 
The objective function values slightly differed from those obtained by the ap- 
proximation model, but all the selected solutions were better in both objective 
functions than the reference design. 

• Song and Keane |46| performed the shape optimization of a civil aircraft en- 
gine nacelle. The primary goal of the study was to identify the trade-off between 
aerodynamic performance and noise effects associated with various geometric 
features for the nacelle. For this, two objective functions were defined: i) scarf 
angle, and ii) total pressure recovery. The nacelle geometry was modeled us- 
ing 40 parameters, from which 33 were considered design variables. In their 
study, the authors implemented the NSGA-II 1 12] as the multi-objective search 
engine, while a commercial CFD software was used for evaluation of the three- 
dimensional flow characteristics. A kriging-based surrogate model was adopted 
in order to keep the number of designs being evaluated with the CFD tool to 
a minimum. In their experiments, the authors reported difficulties in obtaining 
a reliable Pareto front (there were large discrepancies between two consecutive 
Pareto front approximations). They attributed this behavior to the large number 
of variables in the design problem, and also to the associated difficulties to ob- 
tain an accurate kriging model for these situations. In order to alleviate this, they 
performed an analysis of variance (ANOVA) test to find the variables that con- 
tributed the most to the objective functions. After this test, they presented results 
with a reduced surrogate model, employing only 7 decision variables. The au- 
thors argued that they obtained a design similar to the previous one, but requiring 
a lower computational cost because of the use of a reduced number of variables 
in the kriging model. 

• Arabnia and Ghaly 1 2 1 presented the aerodynamic shape optimization of turbine 
stages in three-dimensional fluid flow, so as to minimize the adverse effects of 
three-dimensional flow features on the turbine performance. Two objectives were 
considered: (i) maximization of isentropic efficiency for the stage, and (ii) mini- 
mization of the streamwise vorticity. Additionally, constraints were imposed on: 
(1) inlet total pressure and temperature, (2) exit pressure, (3) axial chord and 
spacing, (4) inlet and exit flow angles, and (5) mass flow rate. The blade ge- 
ometry, both for rotor and stator blades, was based on the E/TU-3 turbine which 
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is used as a reference design to compare the optimization results. The multi- 
objective optimization consisted of finding the best distribution of 2D blade sec- 
tions in the radial and circumferential directions. The authors adopted NSGA 
|[47l as their search engine. Both objective functions were evaluated using a 3D 
CFD flow simulation, taking an amount of time of 10 hours per design candidate. 
The authors adopted an artificial neural network (ANN) based model. The ANN 
model with backpropagation, contained a single hidden layer with 50 nodes, and 
was trained and tested with 23 CFD simulations, sampling the design space us- 
ing the Latin Hypercubes technique. The optimization process was undertaken 
by using the ANN model to estimate both the objective functions, and the con- 
straints. Finally, the nondominated solutions obtained were evaluated with the 
actual CFD flow simulation. The authors indicated that they were able to obtain 
design solutions which were better than the reference turbine design. 

10.3.2.1 Comments Regarding Surrogate-Based Optimization 

The accuracy of the surrogate model relies on the number and on the distribution 
of samples provided in the search space, as well as on the selection of the appropri- 
ate model to represent the objective functions and constraints. One important fact is 
that Pareto-optimal solutions based on the computationally cheap surrogate model 
do not necessarily satisfy the real CFD evaluation. So, as indicated in the previ- 
ous references, it is necessary to verify the whole set of Pareto-optimal solutions 
found from the surrogate, which can render the problem very time consuming. If 
discrepancies are large, this condition might atenuate the benefit of using a surro- 
gate model. The verification process is also needed in order to update the surrogate 
model. This latter condition raises the question of how often in the design process it 
is necessary to update the surrogate model. There are no general rules for this, and 
many researchers rely on previous experiences and trial and error guesses. 

CFD analyses rely on discretization of the flow domain and in numerical models 
of the flow equations. In both cases, some sort of reduced model can be used as 
fitness approximation methods, which can be further used to generate a surrogate 
model. For example, Lee et al. [24] use different grid resolutions for the CFD sim- 
ulations. Coarse grids are used for global exploration, while fine grids are used for 
solution exploitation purposes. 

Finally, many of the approaches using surrogates, build them, relating the design 
variables with the objective functions. However, Leifsson and Koziel [25|, have 
recently proposed the use of physics-based surrogate models in which, they are 
built relating the design variables with pressure distributions (instead of objective 
functions). The premise behind this approach is that in aerodynamics, the objective 
functions are not directly related with the design variables, but with the pressure 
distributions. The authors have presented successful results using this new kind of 
surrogate model for global transonic airfoil optimization. Its extension to multiob- 
jective aerodynamic shape optimization is straightforward and very promising. 
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10.3.3 Hybrid MOEA Optimization 

One of the major drawbacks of MOEAs is that they are very demanding (in terms 
of computational time), due to the relatively high number of objective function 
evaluations that they typically require. This has motivated a number of approaches 
to improve their efficiency. One of them consists in hybridizing a MOEA with a 
gradient-based method. In general, gradient-based methods converge quickly for 
simple topologies of the objective functions but will get trapped in a local optimum 
if multi-modal objective functions are considered. In contrast, MOEAs can nor- 
mally avoid local minima and can also cope with complex, noisy objective function 
topologies. The basic idea behind this hybridization is to resort to gradient-based 
methods, whenever the MOEA convergence is slow. Some representative works 
using this idea are the following: 

• Lian et al. [ 27 1 deal with a multi-objective redesign of the shape blade of a single- 
stage centrifugal compressor. The objectives are: (i) to maximize the total head, 
and (ii) to minimize the input power at a design point. These objectives are con- 
flicting with each other. In their hybrid approach, they couple a gradient-based 
method that uses a Sequential Quadratic Programming (SQP) scheme, with a 
GA-based MOEA. The SQP approach works in a confined region of the de- 
sign space where a surrogate model is constructed, and optimized with gradient- 
based methods. In the hybrid approach of this example, the MOEA is used as a 
global search engine, while the SQP model is used as a local search mechanism. 
Both mechanisms are alternatively used under a trust-region framework until 
Pareto optimal solutions are obtained. By this hybridization approach, favorable 
characteristics of both global and local search are maintained. 

• Chung et al. [9| address a multidisciplinary problem involving supersonic busi- 
ness jet design. The main objective of this particular problem was to obtain a 
trade-off design having good aerodynamic performances while minimizing the 
intensity of the sonic boom signature at the ground level. Multiobjective opti- 
mization was used to obtain trade-offs among the objective functions of the prob- 
lem which were to minimize: (i) the aircraft drag coefficient, (ii) initial pressure 
rise (boom overpressure), and (iii) ground perceived noise level. In this study, 
the authors proposed and tested the Gradient Enhanced Multiobjective Genetic 
Algorithm (GEMOGA). The basic idea of this MOEA is to enhance the non- 
dominated solutions obtained by a genetic algorithm with a gradient-based local 
search procedure. One important feature of this approach was that the gradient 
information was obtained from the Kriging model. Therefore, the computational 
cost was not considerably increased. 

• Ray and Tsai |38| considered a multiobjective transonic airfoil shape design 
optimization problem with two objectives to be minimized: (i) the ratio of the 
drag to lift squared coefficients, and (ii) the squared moment coefficient. Con- 
straints were imposed on the flow Mach number and angle of attack. The MOEA 
used is a multi-objective particle swarm optimizer (MOPSO). This MOEA 
was also hybridized with a gradient-based algorithm. Contrary to standard hy- 
bridization schemes where gradient-based algorithms are used to improve the 
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nondominated solutions obtained (i.e., as a local search engine), in this approach 
the authors used the gradient information to repair solutions not satisfying the 
equality constraints defined in the problem. This repairing algorithm was based 
on the Marquardt-Levenberg algorithm. During the repairing process, a subset 
of the design variables was used, instead of the whole set, in order to reduce the 
dimensionality of the optimization problem to be solved. 

10.3.3.1 Comments on Hybrid MOEA Optimization 

Experience has shown that hybridizing MOEAs with gradient-based techniques can, 
to some extent, increase their convergence rate. However, in the examples presented 
above, the gradient information relies on local and/or global surrogate models. For 
this, one major concern is how to build a high-fidelity surrogate model with the ex- 
isting designs in the current population, since, their distribution in the design space 
can introduce some undesired bias in the surrogate model. Additionally, there are 
no rules for choosing the number of points for building the surrogate model, nor 
for defining the number of local searches to be performed. These parameters are 
emprirically chosen. Another idea that has not been explored in multi-objective 
evolutionary optimization, is to use adjoint-based CFD solutions to obtain gradi- 
ent information. Adjoint-based methods are also mature techniques currently used 
for single objective aerodynamic optimization [28|, and gradient information with 
these techniques can be obtained with as much of an additional objective function 
evaluation. 



10.3.4 Robust Design Optimization 

In aerodynamic optimization, uncertainties in the environment must be taken into 
account. For example, the operating velocity of an aircraft may deviate from the 
normal condition during the flight. This change in velocity can be so high that it 
changes the Mach and/or Reynolds number for the flow. The variation of these pa- 
rameters can substantially change the aerodynamic properties of the design. In this 
case, a robust optimal solution is desired, instead of the optimal solution found for 
ideal operating conditions. By robustness, it is meant in general that the perfor- 
mance of an optimal solution should be insensitive to small perturbations of the 
design variables or environmental parameters. In multiobjective optimization, the 
robustness of a solution can be an important factor for a decision maker in choos- 
ing the final solution. Search for robust solutions can be treated as a multiobjective 
task, i.e., to maximize the performance and the robustness simultaneously. These 
two tasks are very likely conflicting, and therefore, MOEAs can be employed to 
find a number of trade-off solutions. In the context of multi-objective aerodynamic 
shape optimization problems, we summarize next some work on robust design. 

• Yamaguchi and Arima [5 1 1 dealt with the multi-objective optimization of a tran- 
sonic compressor stator blade in which three objectives were minimized: (i) pres- 
sure loss coefficient, (ii) deviation outflow angle, and (iii) incidence toughness. 



10 Evolutionary Algorithms 221 

The last objective function can be considered as a robust condition for the de- 
sign, since it is computed as the average of the pressure loss coefficients at two 
off-design incidence angles. The airfoil blade geometry was defined by twelve 
design variables. The authors adopted MOGA [ 14 1 with real-numbers encoding 
as their search engine. Aerodynamic performance evaluation for the compressor 
blade was done using Navier-Stokes CFD simulations. The optimization process 
was parallelized using 24 processors in order to reduce the computational time 
required. 

• Rai 071 dealt with the robust optimal aerodynamical design of a turbine blade 
airfoil shape, taking into account the performance degradation due to manufac- 
turing uncertainties. The objectives considered were: (i) to minimize the vari- 
ance of the pressure distribution over the airfoil's surface, and (ii) to maximize 
the probability of constraint satisfaction. Only one constraint was considered, re- 
lated to the minimum thickness of the airfoil shape. The author adopted a multi- 
objective version of the differential evolution algorithm and used a high-fidelity 
CFD simulation on a perturbed airfoil geometry in order to evaluate the aerody- 
namic characteristics of the airfoil generated by the MOEA. The geometry used 
in the simulation was perturbed, following a probability density function that is 
observed for manufacturing tolerances. This process had a high computational 
cost, which the author reduced using a neural network surrogate model. 

• Shimoyama et al. [44] applied a design for multi-objective six-sigma (DFMOSS) 
l43l for the robust aerodynamic airfoil design of a Mars exploratory airplane. 
The aim is to find the trade-off between the optimality of the design and its ro- 
bustness. The idea of the DFMOSS methodology was to incorporate a MOEA to 
simultaneously optimize the mean value of an objective function, while minimiz- 
ing its standard deviation due to the uncertainties in the operating environment. 
The airfoil shape optimization problems considered two cases: a robust design of 
(a) airfoil aerodynamic efficiency (lift to drag ratio), and (b) airfoil pitching mo- 
ment constraint. In both cases, only the variability in the flow Mach number was 
taken into account. The authors adopted MOGA [ 14 1 as their search engine. The 
airfoil geometry was defined with 12 design variables. The aerodynamic perfor- 
mance of the airfoil was evaluated by CFD simulations using the Favre- Averaged 
compressible thin-layer Navier-Stokes equations. The authors reported computa- 
tional times of about five minutes per airfoil, and about 56 hours for the total 
optimization process, using a NEC SX-6 computing system with 32 processors. 
Eighteen robust nondominated solutions were obtained in the first test case. From 
this set, almost half of the population attained the 6 a condition. In the second test 
case, more robust nondominated solutions were found, and they satisfied a sigma 
level as high as 25 a. 

• Lee et al. (24 j presented the robust design optimization of an ONERA M6 Wing 
Shape. The robust optimization was based on the concept of the Taguchi method 
in which the optimization problem is solved considering uncertainties in the de- 
sign environment, in this case, the flow Mach number. The problem had two ob- 
jectives: (i) minimization of the mean value of an objective function with respect 
to variability of the operating conditions, and (ii) minimization of the variance 
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of the objective function of each candidate solution, with respect to its mean 
value. In the sample problems, the wing was defined by means of its planform 
shape (sweep angle, aspect ratio, taper ratio, etc.) and of the airfoil geometry, at 
three wing locations (each airfoil shape was defined with a combination of mean 
lines and camber distributions), using a total of 80 design variables to define the 
wing designs. Geometry constraints were defined by upper and lower limits of 
the design variables. The authors adopted the Hierarchical Asynchronous Paral- 
lel Multi-Objective Evolutionary Algorithm (HAPMOEA) algorithm [15], which 
is based on evolution strategies, incorporating the concept of Covariance Matrix 
Adaptation (CM A). The aerodynamic evaluation was done with a CFD simula- 
tion. 12 solutions were obtained in the robust design of the wing. All the nondom- 
inated solutions showed a better behavior, in terms of aerodynamic performance 
(lift-to-drag ratio) with a varying Mach number, as compared to the baseline de- 
sign. During the evolutionary process, a total of 1 100 individuals were evaluated 
in approximately 100 hours of CPU time. 

10.3.4.1 Comments on Robust Design Optimization 

As can be seen form the previous examples, robust solutions can be achieved in 
evolutionary optimization in different ways. One simple approach is to add pertur- 
bations to the design variables or environmental parameters before the fitness is 
evaluated, which is known as implicit averaging l50ll . An alternative to implicit av- 
eraging is explicit averaging, which means that the fitness value of a given design 
is averaged over a number of designs generated by adding random perturbations to 
the original design. One drawback of the explicit averaging method is the number of 
additional quality evaluations needed, which can turn the approach impractical. In 
order to tackle this problem, metamodeling techniques have been considered ll32l . 

10.3.5 Multi-Disciplinary Design Optimization 

Multi-disciplinary design optimization (MDO) aims at incorporating optimization 
methods to solve design problems, considering not only one engineering discipline, 
but a set of them. The optimum of a multidisciplinary problem might be a compro- 
mise solution from the multiple disciplines involved. In this sense, multi-objective 
optimization is well suited for this type of problems, since it can exploit the interac- 
tions between the disciplines, and can help to find the trade-offs among them. Next, 
we present some work in which MOEAs have been used for aerodynamic shape 
optimization problems, coupled with another discipline. 

• Chiba et al. [8| addressed the MDO problem of a wing shape for a transonic 
regional-jet aircraft. In this case, three objective functions were minimized: (i) 
block fuel for a required airplane's mision, (ii) maximum take-off weight, and 
(iii) difference in the drag coefficient between transonic and subsonic flight con- 
ditions. Additionally, five constraints were imposed, three of which were related 
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to the wing's geometry and two more to the operating conditions in lift coeffi- 
cient and to the fuel volume required for a predefined aircraft mission. The wing 
geometry was defined by 35 design variables. The authors adopted ARMOGA 
ll40l . The disciplines involved included aerodynamics and structural analysis and 
during the optimization process, an iterative aeroelastic solution was generated 
in order to minimize the wing weight, with constraints on flutter and strength 
requirements. Also, a flight envelope analysis was done, obtaining high-fidelity 
Navier-Stokes solutions for various flight conditions. Although the authors used 
very small population sizes (eight individuals), about 880 hours of CPU time 
were required at each generation, since an iterative process was performed in or- 
der to optimize the wing weight, subject to aeroelastic and strength constraints. 
The population was reinitialized at every 5 generations for range adaptation of 
the design variables. In spite of the use of such a reduced population size, the au- 
thors were able to find several nondominated solutions outperforming the initial 
design. They also noted that during the evolution, the wing-box weight tended to 
increase, but this degrading effect was redeemed by an increase in aerodynamic 
efficiency, given a reduction in the block fuel of over one percent, which would 
be translated in significant savings for an airline's operational costs. 

• Sasaki et al. [41 1 used MDO for the design of a supersonic wing shape. In this 
case, four objective functions were minimized: (i) drag coefficient at transonic 
cruise, (ii) drag coefficient at supersonic cruise, (iii) bending moment at the wing 
root at supersonic cruise condition, and (iv) pitching moment at supersonic cruise 
condition. The problem was defined by 72 design variables. Constraints were 
imposed on the variables ranges and on the wing section's thickness and camber, 
all of them being geometrical constraints. The authors adopted ARMOGA 114011 . 
and the aerodynamic evaluation of the design soutions, was done by high-fidelity 
Navier-Stokes CFD simulations. No aeroelastic analysis was performed, which 
considerably reduced the total computational cost. The objective associated with 
the bending moment at wing root was evaluated by numerical integration of the 
pressure distribution over the wing surface, as obtained by the CFD analysis. The 
authors indicated that among the nondominated solutions there were designs that 
were better in all four objectives with respect to a reference design. 

• Lee et al. l23l utilized a generic Framework for MDO to explore the improve- 
ment of aerodynamic and radar cross section (RCS) characteristics of an Un- 
manned Combat Aerial Vehicle (UCAV). In this application, two disciplines were 
considered, the first concerning the aerodynamic efficiency, and the second re- 
lated to the visual and radar signature of an UCAV airplane. In this case, three 
objective functions were minimized: (i) inverse of the lift/drag ratio at ingress 
condition, (ii) inverse of the lift/drag ratio at cruise condition, and (iii) frontal 
area. The number of design variables was of approximately 100 and only side 
constraints were considered in the design variables. The first two objective func- 
tions were evaluated using a Potential Flow CFD Solver (FL022) 1 17 1 coupled to 
FRICTION code to obtain the viscous drag, using semi-empirical relations. The 
authors adopted the Hierarchical Asynchronous Parallel Multi-Objective Evolu- 
tionary Algorithm (HAPMOEA) ifTSIl . The authors reported a processing time 
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of 200 hours for their approach, on a single 1.8 GHz processor. It is important 
to consider that HAPMOEA operates with different CFD grid levels (i.e. ap- 
proximation levels): coarse, medium, and fine. In this case, the authors adopted 
different population sizes for each of these levels. Also, solutions were allowed 
to migrate from a low/high fidelity level to a higher/lower one in an island-like 
mechanism. 

10.3.5.1 Comments on Multidisciplinary Design Optimization 

The increasing complexity of engineering systems has raised the interest in multidis- 
ciplinary optimization, as can be seen from the examples presented in this section. 
For this task, MOEAs facilitate the integration of several disciplines, since they do 
not require additional information other than the evaluation of the corresponding 
objective functions, which is usually done by each discipline and by the use of sim- 
ulations. Aditionally, an advantage of the use of MOEAs for MDO, is that they can 
easily manage any combination of variable types, coming from the involved disci- 
plines i.e., from the aerodynamic discipline, the variables can be continuous, but 
for the structural optimization, it can happen that the variables are discrete. Kuhn 
et al. [22] presented an example of this condition for the multi-disciplinary design 
of an airship. However, one challenge in MDO is the increasing dimensionality at- 
tained in the design space, as the number of disciplines also increases. 

10.3.6 Data Mining and Knowledge Extraction 

Data mining tools, along with data visualization using graphical methods, can help 
to understand and extract information from the data contained in the Pareto opti- 
mal solutions found using any MOEA. In this sense, Multi-Objective Design Ex- 
ploration (MODE), proposed by Jeong et al. IfTEl is a framework to extract design 
knowledge from the obtained Pareto optimal solutions such as trade-off informa- 
tion between contradicting objectives and sensitivity of each design parameter to 
the objectives. In the framework of MODE, Pareto-optimal solutions are obtained 
by a MOEA and knowledge is extracted by analyzing the design parameter values 
and the objective function values of the obtained Pareto-optimal solutions using data 
mining approaches such as Self Organizing Maps (SOMs) and analysis of variance 
(ANOVA). They also propose to use rough sets theory to obtain rules from the Pareto 
optimal solutions. MODE has been applied to a wide variety of design optimization 
problems as summarized next: 

• Jeong et al. ifTHl and Chiba et al. (71 0] explored the trade-offs among four aero- 
dynamic objective functions in the optimization of a wing shape for a Reusable 
Launch Vehicle (RLV). The objective functions were: (i) The shift of the aero- 
dynamic center between supersonic and transonic flight conditions, (ii) Pitching 
moment in the transonic flight condition, (iii) drag in the transonic flight condi- 
tion, and (iv) lift for the subsonic flight condition. The first three objectives were 
minimized while the fourth was maximized. These objectives were selected for 
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attaining control, stability, range and take-off constraints, respectively. The RLV 
definition comprised 7 1 design variables to define the wing planform, the wing 
position along the fuselage and the airfoil shape at prescribed wingspan stations. 
The authors adopted ARMOGA |40|, and the aerodynamic evaluation of the RLV 
was done with a Reynolds-Averaged Navier-Stokes CFD simulation. A trade- 
off analysis was conducted with 102 nondominated individuals generated by the 
MOEA. Data mining with SOM was used, and some knowledge was extracted 
in regards to the correlation of each design variable to the objective functions 
in 0; with SOM, Batch-SOM, ANOVA and rough sets in J5); and with SOM, 
Batch-SOM and ANOVA in 1 18 1. In all cases, some knowledge was extracted in 
regards to the correlation of each design variable to the objective functions. 

• Oyama et al. [ 35 1 applied a design exploration technique to extract knowledge in- 
formation from a flapping wing MAV (Micro Air Vehicle). The flapping motion 
of the MAV was analyzed using multi-objective design optimization techniques 
in order to obtain nondominated solutions. Such nondominated solutions were 
further analyzed with SOMs in order to extract knowledge about the effects of the 
flapping motion parameters on the objective functions. The conflicting objectives 
considered were: (i) maximization of the time-averaged lift coefficient, (ii) max- 
imization of the time-averaged thrust coefficient, and (iii) minimization of the 
time-averaged required power coefficient. The problem had five design variables 
and the geometry of the flying wing was kept fixed. Constraints were imposed 
on the averaged lift and thrust coefficients so that they were positive. The authors 
adopted a GA-based MOEA. The objective functions were obtained by means of 
CFD simulations, solving the unsteady incompressible Navier-Stokes equations. 
Objective functions were averaged over one flapping cycle. The purpose of the 
study was to extract trade-off information from the objective functions and the 
flapping motion parameters such as plunge amplitude and frequency, pitching 
angle amplitude and offset. 

• Tani et al. [49| solved a multiobjective rocket engine turbopump blade shape op- 
timization design which considered three objective functions: (i) shaft power, (ii) 
entropy rise within the stage, and (iii) angle of attack of the next stage. The first 
objective was maximized while the others were minimized. The design candi- 
dates defined the turbine blade aerodynamic shape and consisted of 58 design 
variables. The authors adopted MOGA 1 14 1 as their search engine. The objective 
function values were obtained from a CFD Navier-Stokes flow simulation. The 
authors reported using SOMs to extract correlation information for the design 
variables with respect to each objective function. 

10.3.6.1 Comments on Data Mining and Knowledge Extraction 

When adopting the data mining techniques used in the above examples, in which 
analyses are done, correlating the objective functions values, with the design param- 
eter values of the Pareto optimal solutions, some valuable information is obtained. 
However, in many other cases, for aerodynamic flows, the knowledge required is 
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more related to the physics, rather than to the geometry, given by the design 
variables. For example, for understanding the relation between the generation of 
shock wave formation and aerodynamic characteristics in a transonic airfoil opti- 
mization. For this, Oyama et al. [34|, have recently proposed a new approach to ex- 
tract useful design information from one-dimensional, two-dimensional, and three- 
dimensional flow data of Pareto-optimal solutions. They use a flow data analysis 
by Proper Orthogonal Decomposition (POD), which is a statistical approach that 
can extract dominant features in the data by decomposing it into a set of optimal 
orthogonal base vectors of decreasing importance. 



10.4 A Case Study 

Here, we present a case study of evolutionary multi-objective optimization for an 
airfoil shape optimization problem. The test problem chosen corresponds to the 
airfoil shape of a standard-class glider. The optimization problem aims at obtain- 
ing optimum performance for a sailplane. In this study the trade-off among three 
aerodynamic objectives is evaluated using a MOEA. 



10.4.1 Objective Functions 

Three conflicting objective functions are defined in terms of a sailplane average 
weight and operating conditions |48|. They are formally defined as: 

(i) Minimize C D /C L subject to C L = 0.63, Re = 2.04 • 10 6 , M = 0. 12 
(ii) Minimize C D /C L subject to C L = 0.86, Re = 1.63 • 10 6 , M = 0.10 

(iii) Minimize Cd/C^ 2 subject to C L = 1 .05, Re = 1 .29 • 10 6 , M = 0.08 

In the above definitions, Cd/Cl and Cd/C l correspond to the inverse of the 
glider's gliding ratio and sink rate, respectively. Both are important performance 
measures for this aerodynamic optimization problem. Co and Cl are the drag and 
lift coefficients. In the above objective function definitions, the aim is to maximize 
the gliding ratio for objectives (i) and (ii), while minimizing the sink rate in objective 
(iii). Each of these objectives is evaluated at different prescribed flight conditions, 
given in terms of Mach and Reynolds numbers. 

10.4.2 Geometry Parameterization 

Finding an optimum representation scheme for aerodynamic shape optimization 
problems is an important step for a successful aerodynamic optimization task. Sev- 
eral options can be used for airfoil shape parameterization. 

(a) The representation used needs to be flexible to describe any general airfoil shape. 
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(b)The representation also needs to be efficient, in order that the parameterization 
can be achieved with a minimum number of parameters. Inefficient representa- 
tions may result in an unnecesarily large design space which, in consequence, 
can reduce the search efficiency of an evolutionary algorithm. 

(c)The representation should allow the use of any optimization algorithm to perform 
local search. This requirement is important for refining the solutions obtained by 
the global search engine in a more efficient way. 



In the present case study, the PARSEC airfoil representation B31l is used. Fig. llO.il 
illustrates the 11 basic parameters used for this representation: r\ e leading edge 
radius, X up /X[ location of maximum thickness for upper/lower surfaces, Z up jZ[ 
maximum thickness for upper/lower surfaces, Z xxup /Z xx [ curvature for upper/lower 
surfaces, at maximum thickness locations, Z te trailing edge coordinate, AZ te trail- 
ing edge thickness, Ot e trailing edge direction, and f5 te trailing edge wedge angle. 
For the present case study, the modified PARSEC geometry representation adopted 
allows us to define independently the leading edge radius, both for upper and lower 
surfaces. Thus, 12 variables in total are used. Their allowable ranges are defined in 
Table fTOTl 



Table 10.1 Parameter Ranges for Modified PARSEC Airfoil Representation 
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Fig. 10.1 PARSEC airfoil parameterization 



The PARSEC airfoil geometry representation uses a linear combination of shape 
functions for defining the upper and lower surfaces. These linear combinations are 
given by: 






(10.4) 
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(10.5) 



In the above equations, the coefficients a n , and b n are determined as function of the 
12 described geometric parameters, by solving the following two systems of linear 
equations: 
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It is important to note that the geometric parameters ri eup /ri e i , X up /X[ , Z up /Z[ , 
Zxxup/Z xx i , Z te , AZte, CCte, and /3 re are the actual design variables in the optimization 
process, and that the coeficients a n , b n serve as intermediate variables for interpolat- 
ing the airfoil's coordinates, which are used by the CFD solver (we used the Xfoil 
CFD code [ 131) f° r its discretization process. 
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10.4.3 Constraints 

For this case study, five constraints are considered. The first three are defined in 
terms of flight speed for each objective function, namely the prescribed C/, values, 
Cl = 0.63 for objective (i), Cl = 0.86 for objective (ii), and Cl — 1 -05 for objective 
(iii), enable the glider to fly at a given design speed, and to produce the necessary 
amount of lift to balance the gravity force for each design condition being analyzed. 
It is important to note that prescribing the required Cl, the corresponding angle of 
attack a for the airfoil is obtained as an additional variable. For this, the flow solver, 
given the design candidate geometry, solves the flow equations with a constraint on 
the Cl value, i.e., it additionally determines the operating angle of attack a. Two ad- 
ditional constraints are defined for the airfoil geometry. First, the maximum airfoil 
thickness range is defined by 13.0% < t/c < 13.5%. For handling this constraint, 
every time a new design candidate is created by the evolutionary operators, its max- 
imum thickness is checked and corrected before being evaluated. The correction is 
done by scaling accordingly the design parameters Z up and Z/ , which mainly define 
the thickness distribution in the airfoil. In this way, only feasible solutions are eval- 
uated by the simulation process. The final constraint is the trailing edge thickness, 
whose range is defined by 0.25% < AZ te < 0.5%. This constraint is directly handled 
in the lower and upper bounds by the corresponding AZ te design parameter. 



10.4.4 Evolutionary Algorithm 

For solving the above case study, we adopted MODE-LD+SS as our search algo- 
rithm. Additionaly, and for comparison purposes, we also used an implementation 
of the SMS-EMOA algorithm [ 5 ] . This algorithm is based on the hypervolume per- 
formance measure [53 1 and has also been used in the context of airfoil optimization 
problems. 

The Multi-objective Evolutionary Algorithm MODE-LD+SS (see Algorithm Q} 
l3l adopts the evolutionary operators from differential evolution [36|. In the basic 
DE algorithm, and during the offspring creation stage, for each current vector Pi £ 
{P}, three parents (mutually different among them) 111,112,113 G {P} (ui / U2 ^ 
113 ^ Pi) are randomly selected for creating a mutant vector v using the following 
mutation operation: 

v«-Ui + F -(112-113) (10.8) 

F > 0, is a real constant scaling factor which controls the amplification of the differ- 
ence (U2 — 113). Using this mutant vector, a new offspring P i (also called trial vector 
in DE) is created by crossing over the mutant vector v and the current solution P,, in 
accordance to: 

p [ = J V J if {randj(0,l) < CR or j = j mnd 
J *• Pj otherwise 
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Algorithm 1 MODE-LD+SS 
1: INPUT: 

P[1,...,N] = Population 
N = Population Size 
F = Scaling factor 
CR = Crossover Rate 
X [1 , . . . , N] = Weight vectors 
NB = Neighborhood Size 
GMAX = Maximum number of generations 
2: OUTPUT: 

PF = Pareto front approximation 
Begin 
g^O 

Randomly create ff , i = 1, . . . , N 
Evaluate Pf ,i=l,...,N 
while g < GMAX do 
{LND} = {0} 
for i = 1 to N do 

DetermineLocalDominance(Pf ,N B) 
if Pf is locally nondominated then 

{LND}^{LND}UPf 
end if 
end for 
for i = 1 to N do 

Randomly select uj, U2, and 113 from {LND} 
v <— CreateMutantVector(u\ ,112,113) 
Pf +l <- Crossover(Pf ,v) 



3 
4' 
5: 

6 

7 
8 
9 

10 
11 

12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 



Evaluate Pf +1 
end for 

Q^ptUP g+l 
Determine z* for Q 
for i = 1 to N do 

Pf <— MinimumTchebychejf(Q, X' , z*) 

Q^Q\Pf +l 

end for 

PF «- {P}« +1 
end while 
ReturnPF 
End 



In the above expression, the index j refers to the jth component of the decision vari- 
ables vectors. CR is a positive constant and j mnc i is a randomly selected integer in the 
range [1, . . . ,D] (where D is the dimension of the solution vectors) ensuring that the 
offspring is different at least in one component with respect to the current solution 
Pj. The above DE variant is known as Rand /l /bin, and is the version adopted here. 
Additionally, the proposed algorithm incorporates two mechanisms for improving 
both the convergence towards the Pareto front, and the uniform distribution of 
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nondominated solutions along the Pareto front. These mechanisms correspond to 
the concept of local dominance and the use of an environmental selection based on 
a scalar function. Below, we explain these two mechanisms in more detail. 

As for the first mechanism, local dominance concept, in Algorithm [TJ the solu- 
tion vectors 111,112,113, required for creating the trial vector v (in equation (110.8b ), 
are selected from the current population, only if they are locally nondominated in 
their neighborhood X . Local dominance is defined as follows: 

Definition 6. Pareto Local Dominance. Let x be a feasible solution, N (x) be a 
neighborhood structure for x in the decision space, and f(x) a vector of objective 
functions. 

- We say that a solution x is locally nondominated with respect to X (x) if and only 
if there is no x in the neighborhood of x such that f (x ) -< f (x) 

The neighborhood structure is defined as the NB closest individuals to a particular 
solution. Closeness is measured by using the Euclidean distance between solutions 
in the design variable space. The major aim of using the local dominance concept, 
as defined above, is to exploit good individuals' genetic information in creating DE 
trial vectors, and the associated offspring, which might help to improve the MOEA's 
convergence rate toward the Pareto front. From Algorithm [TJ it can be noted that 
this mechanism has a stronger effect during the earlier generations, where the por- 
tion of nondominated individuals is low in the global population, and progressively 
weakens, as the number of nondominated individuals grows during the evolutionary 
process. This mechanism is automatically switched off, once all the individuals in 
the population become nondominated, and has the possibility of being switched on, 
as some individuals become dominated. 

As for the second mechanism, selection based on a scalar function, it is based on 
the Tchebycheff scalarization function given by: 

g(x|A,z*) = max {A<|/K*WI} (10.10) 

\<i<m 

In the above equation, A', 1 ' = 1,...,N represents the set of weight vectors used to 
distribute the solutions along the entire Pareto front. In this case, this set is cal- 
culated using the procedure described in ll52l . z* corresponds to a reference point, 
defined in objective space and determined with the minimum objective values of the 
combined population Q, consistent on the actual parents and the created offspring. 
This reference point is updated at each generation, as the evolution progresses. The 
procedure MinimumTchebycheff(Q, A' , z*) finds, from the set Q, (the combined pop- 
ulation consistent on the actual parents and the created offspring), the solution vec- 
tor that minimizes equation dlO.101 ) for each weight vector A' and the reference 
point z* ■ 

The second MOEA adopted is the SMS-EMOA, which is a steady-state algorithm 
based on two basic characteristics: (1) non-dominated sorting is used as its ranking 
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criterion and (2) the hypervolume! is applied as its selection criterion to discard that 
individual, which contributes the least hypervolume to the worst-ranked front. 

The basic algorithm is described in Algorithm [2] Starting with an initial pop- 
ulation of jx individuals, a new individual is generated by means of randomised 
variation operators. We adopted simulated binary crossover (SBX) and polynomial- 
based mutation as described in [11]. The new individual will become a member of 
the next population, if replacing another individual leads to a higher quality of the 
population with respect to the hypervolume. 

Algorithm 2 SMS-EMOA 

1: 

2 
3 
4 
5 
6 



P <— initQ /* initialize random population of jl individuals */ 

repeat 

q t+ \ «— generate(P t ) /* generate offspring by variation*/ 

P t+ \ <— reduce(P t U{<?/+1 }) I* select [i best individuals */ 

until termination condition is fulfilled 



The procedure Reduce used in Algorithm 2 selects the jx individuals of the sub- 
sequent population; the definition of this procedure is given in Algorithm [3] The 
algorithm fast-nondominated-sort used in NSGA-II lfl2l is applied to partition the 
population into v sets Mi,... ,Sl v . The subsets are called fronts and are provided 
with an index representing a hierarchical order (the level of domination) whereas 
the solutions within each front are mutually nondominated. The first subset con- 
tains all nondominated solutions of the original set Q. The second front consists of 
individuals that are nondominated in the set (Q\&i), e.g. each member of M% is 
dominated by at least one member of &\ . More general, the ith front consists of 
individuals that are nondominated if the individuals of the fronts j with j < i were 
removed from Q. 



Algorithm 3 Reduce(0 



{2%\ ,. .. ,& v } «— fast -nondominated sort (Q) I* all v fronts of Q*/ 

r «— argmin se ^ v [Ay(s,3S v )] /* 5 £ 2$ v with lowest Ay(s,& v )*/ 

return (Q\r) 



The value of Ay(s,3l v )] can be interpreted as the exclusive contribution of s to 
the hypervolume value of its appropriate front. By definition of Ay(s,& v )], an in- 
dividual, which dominates another is always kept and a nondominated individual 
is replaced by a dominated one. This measure keeps those individuals which maxi- 
mize the population's S-Metric value, which implies that the covered hypervolume 



4 The Hypervolume (also known as the S-metric or the Lebesgue Measure) of a set of 
solutions measures the size of the portion of objective space that is dominated by those 
solutions collectively. 
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of a population cannot decrease by application of the Reduce operator. Thus, for 
Algorithm|2]the following invariant holds: 

-y{p,)<-y(p,+i) (io.il) 

Due to the high computational effort of the hypervolume calculation, a steady state 
selection scheme is used. Since only one individual is created, only one has to be 
deleted from the population at each generation. Thus, the selection operator has to 
compute at most \i + 1 values of the S-Metric (exactly ji + 1 values in case all 
solutions are nondominated). These are the values of the subsets of the worst ranked 
front, in which one point of the front is left out, respectively. A (fi + A) selection 
scheme would require the calculation of (^ ) possible S-Metric values to identify 
an optimally composed population, maximising the S-Metric net value. 

The parameters used for solving the present case study, and for each algorithm 
were set as follows: N = 120 (population size) for both MOEAs, F — 0.5 (mutation 
scaling factor for MODE-LD+SS), CR = 0.5 (crossover rate for MODE-LD+SS), 
NB = 5 (neighborhood size for MODE-LD+SS), r\ m = 20 (mutation index for SBX 
in SMS-EMOA), and T) c = 15 (crossover index for SBX in SMS-EMOA). 



10.4.5 Results 

Both, MODE-LD+SS and SMS-EMOA were run for 100 generations. The simula- 
tion process in each case took approximately 8 hrs of CPU time. Five independent 
runs were executed for extracting some statistics. Figs. ll0.2l to ll0.3l show the Pareto 
front approximations (of the median run) at different evolution times. For compar- 
ison purposes, in these figures the corresponding objective functions of a reference 
airfoil (a720o [48|) are plotted. At t = 10 generations (the corresponding figure is 
not shown due to space constraints), the number of nondominated solutions is 26 
for SMS-EMOA and 27 for MODE-LD+SS. With this small number of nondomi- 
nated solutions is difficult to identify the trade-off surface for this problem. How- 
ever, as the number of evolution steps increases, the trade-off surface is more clearly 
revealed. At t = 50 generations (see Fig. 110.21 ). the number of nondominated solu- 
tions is 120 for SMS-EMOA, and 91 for MODE-LD+SS. At this point, the trade-off 
surface shows a steeper variation of objective (iii) toward the compromise region of 
the Pareto front. Also, the trade-off shows a plateau where the third objective has a 
small variation with respect to the other objectives. Finally, at t = 100 generations 
(see Fig. ll0.3l l. the shape of the trade-off surface is more clearly defined, and a clear 
trade-off between the three objectives are evidenced. It is important to note in Fig. 
110.31 that the trade-off surface shows some void regions. This condition is captured 
by both MOEAs and is attributed to the constraints defined in the airfoil geome- 
try. Table l 1 0.2l summarizes the maximum possible improvement with respect to the 
reference solution, that can be attained for each objective and by each MOEA. 

In the context of MOEAs, it is common to compare results on the basis of some 
performance measures. Next, and for comparison purposes between the algorithms 
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Table 10.2 Maximum improvement per objective for the median run of each MOEA used 





MOEA 




SMS-EMOA 


MODE-LD+SS 


Gen 


AObjl{%) 


AObj2(%) 


AObj3{%) 


AObjl{%) 


AObj2(%) 


AObj3{%) 


10 


11.43 


10.19 


5.43 


11.93 


10.38 


5.47 


50 


12.84 


10.67 


6.06 


13.22 


10.67 


6.21 


100 


12.75 


10.79 


6.28 


13.63 


10.80 


6.40 



used, we present the hypervolume values attained by each MOEA, as well as the val- 
ues of the two set coverage performance measure C-M(A,B) between them. Next, 
we present the definition for these two performance measures: 



Hypervolume (Hv): Given a Pareto approximation set PF known , and a reference 
point in objective space z re f, this performance measure estimates the Hypervolume 
attained by it. Such hypervolume corresponds to the non-overlaping volume of all 
the hypercubes formed by the reference point (zref) an d every vector in the Pareto 
set approximation. This is mathematically defined as: 



HV = {\J(volt\veCi e PF known } 



(10.12) 



vect is a nondominated vector from the Pareto set approximation, and vo/, is the vol- 
ume for the hypercube formed by the reference point and the nondominated vector 
vecj. Here, the reference point (z r ef) m objective space for the 3-objective MOPs 
was set to (0.007610 , 0.005895 , 0.005236 ), which corresponds to the objective 
values of the reference airfoil. High values of this measure indicate that the solu- 
tions are closer to the true Pareto front and that they cover a wider extension of it. 



Two Set Coverage (C-Metric): This performance measure estimates the coverage 
proportion, in terms of percentage of dominated solutions, between two sets. Given 
the sets A and B, both containing only nondominated solutions, the C-Metric is 
mathematically defined as: 



C(A,B) 



\{u € B\3v € A : v dominates u}\ 

\B\ 



(10.13) 



This performance measure indicates the portion of vectors in B being dominated by 
any vector in A. The sets A and B correspond to two different Pareto approximations, 
as obtained by two different algorithms. Therefore, the C-Metric is used for pairwise 
comparisons between algorithms. 

For the hypervolume measure, SMS-EMOA attains a value of Hv = 1.5617 • 
10~ 10 with a standard deviation of a = 2.4526 • 10~ 12 , while MODE-LD+SS attains 
a value of Hv = 1.6043 • 10~ 10 with a standard deviation of a = 1.2809 • 10~ 12 . 
These results are the average of five independent runs executed by each algorithm. 
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SMS-EMOA 

MODE-LD+SS 

a720o 




Fig. 10.2 Pareto front approximation at Gen = 50 (6000 OFEs) 



SMS-EMOA o 

MODE-LD+SS • 

a720o ■ 




Fig. 10.3 Pareto front approximation at Gen = 100 (12000 OFEs) 



As for the C-Metric, the corresponding values obtained are: C — M(SMS — 
EMOA,MODE-LD + SS) = 0.07016 with a standard deviation of a = 0.03134, 
and C - M(MODE -LD + SS, SMS - EMOA) = 0.3533 with a standard deviation 
of a = 0.0510. These latter results are the average of all the pairwise combinations 
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Fig. 10.4 Airfoil shape comparison 

of the five independent runs executed by each algorithm. Our results indicate that 
MODE-LD+SS converges closer to the true Pareto front, and provides more non- 
dominated solutions than SMS-EMOA. 

Finally, in Figure 110.41 are presented the geometries of the reference airfoil, 
a720o, and two selected airfoils from the trade-off surface of this problem and ob- 
tained by SMS-EMOA and MODE-LD+SS at t = 100 generations. These two latter 
airfoil are selected as those with the closest distance to the origin of the objective 
space, since they are considered to represent the best trade-off solutions. 

10.5 Conclusions and Final Remarks 



In this chapter we have presented a brief review of the research done on multi- 
objective aerodynamic shape optimization. The examples presented cover a wide 
range of current applications of these techniques in the context of aeronautical en- 
gineering design, and in several design scenarios. The approaches reviewed include 
the use of surrogates, hybridizations with gradient-based techniques, mechanisms 
to search for robust solutions, multidisciplinary approaches, and knowledge extrac- 
tion techniques. It can be observed that several Pareto-based MOEAs have been 
successfully integrated in industrial problems. It can be anticipated that in the near 
future, an extended use of these techniques will be a standard practice, as the com- 
puting power available continues to increase each year. It is also worth noting that 
MOEAs are flexible enough as to allow their coupling to both engineering models 
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and low-order physics-based models without major changes. They can also be easily 
parallelized, since MOEAs normally have low data dependency. 

From an algorithmic point of view, it is clear that the use of Pareto-based MOEAs 
remains as a popular choice in the previous group of applications. It is also evident 
that, when dealing with expensive objective functions such as those of the above ap- 
plications, the use of careful statistical analysis of parameters is unaffordable. Thus, 
the parameters of such MOEAs were simple guesses or taken from values suggested 
by other researchers. The use of surrogate models also appears in these costly ap- 
plications. However, the use of other simpler techniques such as fitness inheritance 
or fitness approximation [ 39 ] seems to be uncommon in this domain and could be 
a good alternative when dealing with high-dimensional problems. Additionally, the 
authors of this group of applications have relied on very simple constraint-handling 
techniques, most of which discard infeasible individuals. Alternative approaches ex- 
ist, which can exploit information from infeasible solutions and can make a more 
sophisticated exploration of the search space when dealing with constrained prob- 
lems (see for example |29|) and this has not been properly studied yet. Finally, it is 
worth emphasizing that, in spite of the difficulty of these problems and of the evi- 
dent limitations of MOEAs to deal with them, most authors report finding improved 
designs when using MOEAs, even when in all cases a fairly small number of fit- 
ness function evaluations was allowed. This clearly illustrates the high potential of 
MOEAs in this domain. 
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Chapter 11 

An Enhanced Support Vector Machines Model 

for Classification and Rule Generation 

Ping-Feng Pai and Ming-Fu Hsu 



Abstract. Based on statistical learning theory, support vector machines (SVM) 
model is an emerging machine learning technique solving classification problems 
with small sampling, non-linearity and high dimension. Data preprocessing, pa- 
rameter selection, and rule generation influence performance of SVM models a lot. 
Thus, the main purpose of this chapter is to propose an enhanced support vector 
machines (ESVM) model which can integrate the abilities of data preprocessing, 
parameter selection and rule generation into a SVM model; and apply the ESVM 
model to solve real world problems. The structure of this chapter is organized as 
follows. Section 11.1 presents the purpose of classification and the basic concept of 
SVM models. Sections 11.2 and 11.3 introduce data preprocessing techniques, 
metaheuristics for selecting SVM models. Rule extraction of SVM models is ad- 
dressed in Section 11.4. An enhanced SVM scheme and numerical results are illus- 
trated in Section 1 1.5 and 1 1.6. Conclusions are made in Section 1 1.7. 

Keywords: Support vector machines, Data preprocessing, Rule extraction, 
Classification. 



11.1 Basic Concept of Classification and Support Vector 
Machines 

The data mining technique observes enormous records comprising information 
about the target and input variables. Imagine that investors would like to classify 
the financial status based on characteristics of the firm, such as return on asset 
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(ROA), quick ratio, and return on investment (ROI). This is a classification task 
and data mining techniques are suitable for this task. The goal of data mining is to 
build up a suitable model for a labeling process that approximates the original 
process as closely as possible. Thus, investors can adopt the well-developed model 
to learn the status of firm. 

Support vector machines (SVM) were proposed by Vapnik [42, 43] originally 
for typical binary classification problems. The SVM implements the structural risk 
minimization (SRM) principle rather than the empirical risk minimization (ERM) 
principle employed by most traditional neural network models. The most impor- 
tant concept of SRM is the minimization of an upper bound to the generalization 
error instead of minimizing the training error. In addition, the SVM will be equiv- 
alent to solving a linear constrained quadratic programming (QP) problem, so that 
the solution for SVM is always unique and globally optimal [6, 12, 14, 41, 42, 43]. 

Given a training set of instance-base pairs (x h yi), i = l,...,m, where x. e R" and 

yi g {+1}, SVM determines an optimal separating hyperplane with the maximum 
margin by solving the following optimization problem: 

• 1 T 

mm — ww ^1 1 i\ 

«.s 2 ( 1L1 ) 

s.t. y i {w-x i +g)-l>0 

where w denotes the weight vector, and g denotes the bias term. 

The Lagrange function's saddle point is the solution to the quadratic optimiza- 
tion problem: 

1 m 

L h {w,g,a) = -w T ■w-Y.iajXw-x, +g)-l) ( 1L2 ) 

2 j=i 

where a t is Lagrange multipliers and a t > 0. 

To identify an optimal saddle point is necessary because the L h must be 
minimized with respect to the primal variable w and gand maximized the non- 
negative dual variable a r By discriminating w and g, and proposing the Karush 
Kuhn-Tucker (KKT) condition for the optimum constrained function, L h is 
transformed to the dual Lagrangian L E (a) '■ 

in 1 m 

max L F (a) = Vff, — V etr.etr , y, y , ( x, , x , > 
« tT 2J5S ' ' y,y '\ " >l (11.3) 

s.t. a i >0,i = l,...,m and ^ar,,y, =0 

Dual Lagrangian L E (a) must be maximized with respect to non-negative a t to 
identify the optimal hyperplane. The parameters w and g of the optimal hyper- 
plane were determined by the solution a t for the dual optimization problem. 
Therefore, the optimal hyperplane f( x ) = sign\(w* ■ x) + gj can be illustrated as: 
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f(x) = sign £ y t a' t (x, ,x) + g 



(11.4) 



In a binary classification task, only a few subsets of the Lagrange multipliers a t 
usually tend to be greater than zero. These vectors are the closest to the optimal 
hyperplane. The respective training vectors having non-zero a t are called support 
vectors, as the optimal decision hyperplane f(x, a ,g ) depends on them exclu- 
sively. Figure 11.1 illustrates the basic structure of S VM. 

Very few data sets in the real world are linearly separable. What makes SVM 
so remarkable is that the basic linear framework is easily extended to the case 
where the data set is not linearly separable. The fundamental concept behind this 
extension is to transform the input space where the data set is not linearly 
separable into a higher-dimensional space, where the data are linearly separable. 
Figure 11.2 illustrates the mapping concept of SVM. 



X* Optimal hyperplane 




Fig. 11.1 The basic structure of the SVM [12] 
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Fig. 11.2 Mapping a non-linear data set into a feature space [6] 



In terms of the introduced slack variables, the problem of discovering the 
hyperplane with minimizing the training errors is illustrated as follows: 



■ 1 T 
rmn — w ■ w ■ 

».s4 2 



■clfi 



s.t. v,((w.x,) + g)+£-l>0 

6*o 



(11.5) 



where C is a penalty parameter on the training error, and ^ is the non-negative 
slack variable. The constant C used to determine the trade-off between margin size 
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and error. Observe that C is positive and cannot be zero; that is, we cannot simply 
ignore the slack variables by setting C = 0. With a large value for C, the optimiza- 
tion will try to discover a solution with a small number of non-zero slack variables 
because errors are costly [14]. Above all, it can be concluded that a large C 
implies a small margin, and a small C implies a large margin. 

The Lagrangian method can be used to solve the optimization model, which is 
almost equivalent to the method for dealing with the optimization problem in the 
separable case. One has to maximize the dual variables Lagrangian: 

m 1 m 

max L F (a) = Vor ; — V a l a i y,y i (x, -x\ 
- tT 2~ ' ' ' A ' '' (11.6) 

in 

s.t. 0<a j <C,i = \,...,mand ^o^, —Q 

A dual Largrangian L E (a) has to be maximized with respect to non-negative ct; un- 
der the constraints V '" a v =0 an d < a < C to determine the optimal hyper- 

plane. The penalty parameter C is an upper bound on a h and determined by the 
user. 

The mapping function <& is used to map the training samples from the input 
space into a higher-dimensional feature space. In Eq.11.6, the inner products are 
substituted by the kernel function (0(jc,)- 4>(y,)) = K(xi,Xj), and the nonlinear SVM 
dual Lagrangian L E (a) shown in Eq.(11.7) is similar to that in the linear general- 
ized case: 

m 1 m 

l e {o) =Y a . -^Y, a . a jy.yA x , ■ x j) 

i=i £ ij=\ Ui-'J 

m 

s.t. §<a i <C,i = \,...,mand ^a,y t =0 

i=i 

Hence, followed the steps illustrated in the linear generalized case, we derive the 
decision function of the following form: 

f(x) = sign 1 1>,-«* (*(*), *(*,-)) + g* \ = sign y^yrf (k(x, Xi )) + g'\ ( 1L8 ) 

The function K is defined as the kernel function for generating the inner products 
to construct machines with different types of nonlinear decision hyperplane in the 
input space. There are several kernel functions, depicted as follows. The 
determination of kernel function type depends on the problem's complexity [12]. 

Radial Basis Function (RBF): K{x,x t ) = exp{- ||jc - jc.|| 2 /2a 2 \ 
Polynomial kernel of degree d: k(x, x t ) = (x, x t Y 
Sigmoid kernel: k(x, x, ) - tanh(A r (x, x, ) + r) 
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11.2 Data Preprocessing 

Data sometimes are missing, noisy and inconsistent; and irrelevant or redundant 
attributes of data increase the computational complexity and decrease performance 
of data mining models. To be useful for data mining purposes, the original data 
need to be preprocessed in the form of cleaning, transformation, and reduction. 
The data without the preprocessing procedures would cause confusion for the data 
mining procedure and result in unreliable output. 



11.2.1 Data Cleaning 

The purpose of data cleaning is to fill in missing value, eliminate the noise (out- 
liers), and correct the inconsistencies in the data. Let us look at the following ap- 
proaches for missing value [9, 21, 35, 37]: 

Ignore the missing value. 

Fill in the missing value manually. 

Apply a global constant to replace the missing value. 

Apply the mean attribute to replace the missing value. 

Apply the most probable value to fill in the missing value. 

Noise data (e.g., outlier) is a random error or variance in the measured data. Even 
a small number of extreme values can lead to different results and impair the con- 
clusion. There are some smoothing methods (e.g., binning, regression and cluster- 
ing) to offset the effect caused by a small number of extreme values [3, 28, 37, 
44]. Human error in data entry, deliberate errors and data decay are some of the 
reasons for inconsistent data. Missing values, noise, and inconsistent data lead to 
inaccurate results. Data cleaning is the first step to analyzing the original data 
which would lead to reliable mining result. Figure 1 1.3 illustrates the original data 
processed by the procedure of data cleaning [9, 36]. 

O An o O cd . \ I / . 
— cr-ZS— """ ' 



Original data 



> " 



Clean data 



Fig. 11.3 Data cleaning [12] 

11.2.2 Data Transformation 

Data transformation is used to transform or consolidate data into forms suitable for 
the data mining process. Data transformation consists of the following processes 
[15,17,36,38,39]: 

• Smoothing is employed to remove the noise from the data is illustrated in Fig. 1 1 .4. 

• Aggregation aggregates the data to construct the data cube for analysis. 
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• Generalization replaces the lower-level data with higher-level data. 

• Normalization scales the attribute data to fall within a small specified range. 
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Fig. 11.4 The process of smoothing 

11.2.3 Data Reduction 

The purpose of the data reduction is to create a reduced representation of the data- 
set which is much smaller in volume yet closely sustains the integrity of the raw 
data. Dealing with the reduced data set enhances efficiency while producing the 
same analytical results. Data reduction consists of the following process [1, 2, 4, 5, 
7,18,19,24,40,45]: 

• The aggregation of the data cube is employed to construct a data cube which 
is illustrated in Fig. 1 1.5. 

• Attribute selection is used to remove the irrelevant, redundant or weak at- 
tributes, as shown in Fig. 11.6. 

• Dimension reduction is used to reduce or compress the representation of the 
raw dataset. Raw data which can be reconstructed from the compressed data 
without losing any information is called lossless. In contrasti the approxima- 
tion of the reconstructed raw data is called lossy. 
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Fig. 11.5 Aggregation of the data cube [12] 
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Fig. 11.6 Attribute selection [12] 
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11.3 Parameter Determination of Support Vector Machines by 
Meta-heuristics 

Appropriate parameter setting can improve the performance of SVM models. Two 
parameters (C and o) have to be determined in the SVM model with RBF kernel. 
The parameter C is the cost of penalty which influences the classification per- 
formance. If C is too large, the classification accuracy is very high in training data 
set, but very low in testing data set. If C is too small, the classification accuracy is 
inferior. The parameter a has more influence than parameter C on classification 
outcome, because the value affects the partitioning outcome in the feature space. 
A large value for parameter a leads to over-fitting, while a small value results in 
under-fitting [22]. The Grid search [24] is the most common approach to deter- 
mine parameters of SVM models. Nevertheless, this approach is a local search 
technique, and tends to reach the local optima [20]. Furthermore, setting appropri- 
ate search intervals is an essential problem. A large search interval increases the 
computational complexity, while a small search interval would cause an inferior 
outcome. Some metaheuristics were proposed to select satisfactory parameters of 
SVM models [29, 30, 31, 32, 33, 34, 35]. The basic concept is to transfer the fit- 
ness functions of meta-heuristics into the forms of classification performance cri- 
teria (classification accuracy or error) of the SVM models. The fitness function of 
proposed metahuristics is used to measure the classification accuracy of the SVM 
model. Making the classification performance criteria acceptable for the metaheu- 
ristic algorithms is the most critical part of this procedure. 

11.3.1 Genetic Algorithm 

Holland [13] proposed the genetic algorithm (GA) to understand the adaptive 
processes of natural systems. Subsequently, they were employed for optimization 
and machine learning in the 1980's. Originally, GA was associated with the use of 
binary representation, but currently we can find it used with other types of repre- 
sentations and applied in many research domains. The basic principle is the prin- 
ciple of survival of the fittest. It tries to keep genetic information from generation 
to generation. The major merits of GA are their ability to find optimal or near op- 
timal solutions with relatively modest computational requirements. The concept is 
briefly illustrated as follows and illustrated in Fig. 1 1.7. : 

Initialization: The initial population of chromosomes is established randomly. 
Evaluating fitness: Evaluate the fitness of each chromosome. The classifica- 
tion accuracy is used as the fitness function. 
Selection: Select a mating pair for reproduction. 

Crossover and mutation: Create new offspring by performing crossover and 
mutation operations. 

Next generation: Create a population for the next generation. 
Stop condition: If the number of generations equals a threshold, then the best 
chromosomes are presented as a solution; otherwise go back to step (b) 
[29,31]. 
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Fig. 11.7 The architecture of GA to determine parameters of S VM 

11.3.2 Immune Algorithm 

The immune algorithm (I A) [10] was based on the natural immune systems which 
efficiently distinguish all cells within the body and classify those cells as self or 
non-self cells. Non-self cells trigger a defense procedure which defends against 
foreign invaders. The antibodies are expressed by two SVM parameters. The clas- 
sification error of SVM is contained in the denominator part of the affinity for- 
mula. Therefore, the reason for maximizing the affinity of IA is to minimize clas- 
sification errors of the SVM model. IA search algorithm applied to determine the 
parameters of SVM is described as follows and illustrates in Fig. 11.8. : 

• Initialization: Both the initialized antibody population and the population of 
the initial antibody were created randomly. 

• Evaluation fitness: The classification error (CE) was treated as the fitness of IA. 

• Affinity and similarity: When affinity values are high, the affinity and the 
similarity antibodies having higher activation levels of antigens are identi- 
fied. To maintain the diversity of the antibodies stored in the memory cells, 
antibodies with a higher affinity value and a lower similarity value have a 
good likelihood of entering the memory cells. Eq. (11.9) is used to depict 
the affinity between the antibody and antigen: 



Antigen = 1 / 1 + CE 



(11.9) 



A smaller CE indicates a higher affinity value. Eq. (11.10) is applied to 
illustrate the similarity between antibodies: 

Antibodies = 1 / 1 + G„ (11.10) 



where Gy is the difference between the two classification errors 
calculated by the antibodies inside and outside the memory cells. 
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Selection: Select the antibodies in the memory cells. Antibodies with higher 
values of Antigen are treated as candidates to enter the memory cell. How- 
ever, the antibody candidates with Antibodies^ values exceeding the thresh- 
old are not qualified to enter the memory cell. 

Crossover and mutation: The antibody population is undergoing crossover 
and mutation. Crossover and mutation are used to generate new antibodies. 
When conducting the crossover operation, strings representing antibodies 
are paired randomly. Segments of paired strings between two predetermined 
break-points are swapped. 

Perform tabu search [11] on each antibody: Evaluate neighbor antibodies 
and adjust the tabu list. The antibody with the better classification error and 
not recorded on the tabu list is placed on the tabu list. If the best neighbor 
antibody is the same as one of the antibodies on the tabu list, then the next 
set of neighbor antibodies is generated and the classification error of the an- 
tibody calculated. The next set of neighbor antibodies is generated from the 
best neighbor antibodies in the current iteration. 

Current antibody selection by tabu search: If the best neighbor antibody is 
better than the current antibody, then the current antibody is replaced by the 
best neighbor antibody. Otherwise, the current antibody is retained. 
Next generation: From a population for the next generation. 
Stop criterion: If the number of epochs is equal to a given scale, then the 
best antibodies are presented as a solution; otherwise go to Step (b) [32, 33]. 
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Fig. 11.8 The architecture of IA/TS to determine parameters of SVM 



11.3.3 Particle Swarm Optimization 



The particle swarm optimization (PSO) algorithm [16] is another population-based 
meta-heuristic inspired by swarm intelligence. It simulates the behavior of birds 
flocking to a promising position with sufficient food. A particle is considered as a 
point in a G-dimensional space and its status is characterized according to its posi- 
tion y ig and velocity s ig . The G-dimensional position for the particle i at iteration t 
is expressed as yf = {v,/,.--, v,g'}. 

The velocity, which is also a G-dimensional vector, for particle i at iteration t is 
illustrated as sf = {s n ',..., s iG '}. Let b\ = {bn',---, b iG '} be the best solution that par- 
ticle i has obtained until iteration t, and b m ' = {b ml ',..., b mG '} represents the best 
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solution from b{ in the population at iteration t. To search for an optimal solution, 
each particle changes its velocity according to cognition and sociality. Each parti- 
cle then moves to a new potential solution. The use of PSO algorithm to select 
SVM parameters is described as follows. First, initialize a random population of 
particles and velocities. Second, define the fitness of each particle. The fitness 
function of PSO is represented as the classification accuracy of SVM models. 
Each particle's velocity is expressed by Eq. (11.11). For each particle, the 
procedure then moves to the next position according to Eq. (11.12). 



S ',=K 



+ cJ l {B' ls -y' lg )+c 2 j 2 {B' ms -y' mg \g=l...,G 



(11.11) 



where c\ is the cognitive learning factor, c 2 is the social learning factor, andy'[ and 
72 are the random numbers uniformly distributed in £/(0,l). 



Y' +l = Y' 

•S 'g 



■s; s ,g = i...,G 



(11.12) 



Finally, if the termination criterion is reached, the algorithm stops; otherwise re- 
turn to the step of fitness measurement [34]. The architecture of PSO is illustrated 
in Fig. 11.9. 
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Fig. 11.9 The architecture of PSO to determine parameters of SVM 



11.4 Rule Extraction Form Support Vector Machines 



Support vector machines are state-of-the art data mining techniques which have 
proven their performance in many research domains. Unfortunately, while the 
models may provide a high accuracy compared to other data mining techniques, 
their comprehensibility is limited. In some areas, such as credit scoring, the lack of 
comprehensibility of a model is a main drawback causing reluctance of users to 
use the model [8]. Furthermore, when credit has been denied to a customer, the 
Equal Credit Opportunity Act of the US requires that the financial institution 
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provide specific reasons why the application was rejected; and indefinite and 
vague reasons for denial are illegal [23]. Comprehensibility can be added to SVM 
by extracting symbolic rules from the trained model. Rule extraction techniques 
would be used to open up the black box of SVM and generate comprehensible de- 
cision rules with approximately the same detective power as the model itself. 
There are two ways to open up the black box of SVM, as shown in Fig. 11.10. 
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Fig. 11.10 Experimental (A) and de-compositional (B) rule extraction techniques [23] 

The SVM with the best cross validation (CV) result is then fed into rule-based 
classifier (i.e., decision tree, rough set and so on) to derive the comprehensive de- 
cision rules for humans to understand (experimental rule extraction technique). 
The concept behind this procedure is the assumption that the trained model can 
more appropriately represent the data than can the original dataset. This is to say 
that the data of the best CV result is cleaner and free of curial conflicts. The CV is 
a re-sampling technique which adopts multiple random training and test subsam- 
ples to overcome the overfitting problem. Overfitting would lead to SVM losing 
its applicability, as shown in Fig. 11.11. The CV analysis would yield useful in- 
sights on the reliability of the SVM model with respect to sampling variation. 
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Fig. 11.11 Classification errors vs. model complexity of SVM models [12] 



Decompositional rule extraction was proposed by Nunez et al. [25, 26] and pro- 
poses rule-defining regions based on the prototype and support vectors [23]. The 
representative of the obtained clusters is prototype vectors. The clustering task is 
overcome by vector quantization. There are two kinds of rules which can be 
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proposed: equation rules and interval rules, respectively corresponding to an ellip- 
soid and interval region, which can be built in the following manner [18]. Apply- 
ing the prototype vector as center, an ellipsoid is constructed where the axes are 
determined by the support vector within the partition lying the furthest from the 
center. The long axes of the ellipsoid are defined by the straight line connecting 
these two vectors. The interval regions are defined from ellipsoids parallel to the 
coordinate axes [23]. Figure 11.12 is used to illustrate the basic structure of SVM 
+ Prototype approach. 



Age 



_T 




Income 



SVM + Prototype 



_T 



Equation rule 
If Al *age2+A2*income2H 
then customer = good 



Interval rule 
If age G [B 1 ,B2] and income t 
then customer = good 



[B3,B4] 



Fig. 11.12 SVM + Prototype model [25, 26] 



11.5 The Proposed Enhanced SVM Model 



In this section, the scheme of a proposed ESVM model is illustrated. Figure 11.13 
shows the flowchart of the ESVM model, including functions of data preprocess- 
ing, parameter determination and rule generation. First, the raw data is processed 
by data-preprocessing techniques containing data cleaning, data transformation, 
feature selection, and dimension reduction. Second, the preprocessed data are di- 
vided into two sets: training and testing data sets. The training data set is used to 
select a data set used for rule generation. To prevent overfitting, a cross-validation 
(CV) procedure is performed at this stage. The testing data set is employed to ex- 
amine the classification performance of a well-trained SVM model. Sequentially, 
metaheuristics are used to determine the SVM parameters. The training errors of 
SVM models are formulated as forms of fitness function of metaheuristics. Thus, 
each succeeding iteration produces a smaller classification error. The parameter 
search procedure is performed until the stop criterion of the metaheuristic is 
reached. The two parameters resulting in the smallest training error are then em- 
ployed to undertake testing procedures and therefore testing accuracy is obtained. 
Finally, the CV training data set with the smallest testing error is utilized to derive 
decision rules by rule extraction mechanisms. Accordingly, the proposed ESVM 
model can provide decision rules as well as classification accuracy for decision 
makers. 
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Fig. 11.13 The flowchart of the ESVM model 



11.6 A Numerical Example and Empirical Results 

A numerical example borrowed from Pai et al. [34] was used here to illustrate the 
classification and rule generation of SVM models. The original data used in this 
example contain 75 listed firms in Taiwan's stock market. These firms were di- 
vided into 25 fraudulent financial statement (FFS) firms and 50 non-fraudulent 
financial statement (non-FFS) firms. Published indication or proof of involvement 
in issuing FFS was found for the 25 FFS firms. The classification of a financial 
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statement as fraudulent is based on the Security of Futures Investor Protection 
Center in Taiwan (SFI) and the Financial Supervisory Commission of Taiwan 
(FSC) during the 1999-2005 reporting period. All the condition variables were 
used in the sample were generated from formal financial statements, such as bal- 
ance sheets and income statements. The 18 features consist of 16 financial vari- 
ables and two corporate governance variables were adopt in this study. The 
features selected by sequential forward selection (SFS) were illustrated in 
Table. 11.1. In addition, the grid search (GS) approach, genetic algorithms (GA), 
simulated annealing algorithms (SA) and particle swam optimization (PSO) were 
used to deal with the same data in selecting SVM parameters. The classification 
performances of four approaches in determining SVM parameters were summa- 
rized in Table 1 1.2. It can be concluded that the PSO algorithm was superior to the 
other three approaches in terms of average testing accuracy in this study. To dem- 
onstrate the generalization ability of SVM, three other classifiers, C4.5 decision 
tree (C4.5), multi-layer perception (MLP) neural networks, and RBF networks 
were examined. Table 11.3 indicates that the SVM model outperformed the other 
three classifiers in terms of testing accuracy. Moreover, the CART approach was 
used to derive "if-then" rules from the CV training data set with the best testing 
result. Thus, this procedure can help auditors to allocate limited audit resources. 
The decision rules derived from CART are listed in Table 1 1 .4. It can be observed 
that the feature of "Pledged Share of Directors"is the first split point. This implies 
that shares pledged by directors are essential in detecting FFS by top management. 
Clearly, auditors have to concentrate on this critical signal in audit procedures. 



Table 11.1 The selected features by feature selection [34] 

Method Features 

SFS Al: Net income to Fixed asset; A2: Net profit to 

Total asset; A3: Earnings before Interest and Tax; 

A4: Inventory to Sales; A5: Total debt to Total 
Asset; A6: Pledged shares of Directors 



Table 11.2 Classification performance of four methods in determining SVM parameters 

[34] 



Methods 


Cross-validation 








Accuracy (%) 




CV-1 


CV-2 


CV-3 


CV-4 


CV-5 




Grid 


86.67 


80 


73.33 


80 


80 


80 


GA 


80 


86.67 


80 


86.67 


86.67 


84 


SA 


80 


86.67 


86.67 


93.33 


96.67 


86.67 


PSO 


93.33 


80 


93.33 


93.33 


93.33 


92 
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Table 11.3 Testing accuracy of six classifiers [34] 



Classifier 


Cross-validation 








Accuracy (%) 




CV-1 


CV-2 


CV-3 


CV-4 


CV-5 




C4.5 


73.33 


80 


86.67 


93.33 


86.67 


84 


MLP 


73.33 


86.67 


80 


86.67 


86.67 


82.67 


RBFNN 


86.67 


80 


80 


86.67 


80 


82.67 


SVM 


93.33 


86.67 


93.33 


93.33 


93.33 


92 



Table 11.4 Decision rules derived from CART [34] 



(1) If "pledged shares of directors" — 44.405 , then "FFS" 

(2) If "pledged shares of directors" ^ 44.405 and "net profit to total assets" ^ - 
0.3229 , then "FFS" 

(3) If "pledged shares of directors" ^ 44.405 , "net profit to total assets" — -0.3229 
and "net income to fixed assets" — 0.0497 , then "non-FFS" 

(4) If "pledged shares of directors" ^ 44.405 , "net profit to total assets" — - 
0.3229, "net income to fixed assets" ^ 0.0497 and "earnings before interest 
and tax" < -42220, then "non-FFS" 

(5) If "pledged shares of directors" ^44.405, "net profit to total assets" — -0.3229, 
"net income to fixed assets" ^ 0.0497, "earnings before interest and tax" — - 
42220, and "total debt to total assets " - 1 .48 then, "FFS" 

(6) If "pledged shares of directors" ^44.405, "net profit to total assets"— -0.3229, 
"net income to fixed assets" ^ 0.0497, "earnings before interest and tax" — - 
42220, and "total debt to total assets" < 1 .48 then, "non-FFS" 



11.7 Conclusion 

In this chapter, the three essential issues influencing the performance of SVM 
models were pointed out. The three issues are: data preprocessing, parameter de- 
termination and rule extraction. Some investigations have been conducted into 
each issue respectively. However, this chapter is the first study proposing an en- 
hanced SVM model which deals with three issues at the same time. Thanks to data 
preprocessing procedure, the computation cost decreases and the classification ac- 
curacy increases. Furthermore, the ESVM model provides rules for decision mak- 
ers. Rather than the expression of complicated mathematical functions, it is easy 
for decision makers to realize the relation and strength between condition attrib- 
utes and outcome intuitively form a set of rules. These rules can be reasoned in 
both forward and backward ways. For the example in Section 11.6, the forward 
reasoning can provide a good direction for managers to improve the current finan- 
cial status; and the backward reasoning can protect the wealth of investors and 
sustain the stability of financial market. 
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Chapter 12 

Benchmark Problems in Structural 

Optimization 

Amir Hossein Gandomi and Xin-She Yang 



Abstract. Structural optimization is an important area related to both optimization 
and structural engineering. Structural optimization problems are often used as 
benchmarks to validate new optimization algorithms or to test the suitability of a 
chosen algorithm. In almost all structural engineering applications, it is very im- 
portant to find the best possible parameters for given design objectives and con- 
straints which are highly non-linear, involving many different design variables. 
The field of structural optimization is also an area undergoing rapid changes in 
terms of methodology and design tools. Thus, it is highly necessary to summarize 
some benchmark problems for structural optimization. This chapter provides an 
overview of structural optimization problems of both truss and non-truss cases. 

12.1 Introduction to Benchmark Structural Design 

New optimization algorithms are often tested and validated against a wide range 
of test functions so as to compare their performance. Structural optimization prob- 
lems are complex and highly nonlinear, sometimes even the optimal solutions of 
interest do not exist. In order to see how an optimization algorithm performs, 
some standard structural engineering test problems are often solved. Many struc- 
tural test functions exist in the literature, but there is no standard list or set of the 
functions one has to follow. Any new optimization algorithm should be tested us- 
ing at least a subset of well-known, well-established functions with diverse 
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properties so as to make sure whether or not the tested algorithm can solve certain 
types of optimization efficiently. According to the nature of the structural optimi- 
zation problems, we can first divide them into two groups: trass and non-truss de- 
sign problems. The selected lists of the test problems for each optimization group 
are listed below: 

Trass design problems: 

10-bar plane truss 

25-space trass 

72-bar trass 

120-bar trass dome 

200-bar plane truss 

26-story truss tower 
Non-truss design problems: 

Welded beam 

Reinforced concrete beam 

Compression Spring 

Pressure vessel 

Speed reducer 

Stepped cantilever beam 

Frame optimization 

12.1.1 Structural Engineering Design and Optimization 

Many problems in structural engineering and other disciplines involve design op- 
timization of dozens to thousands of parameters, and the choice of these parame- 
ters can affect the performance or objectives of the system concerned. The optimi- 
zation target is often measured in terms of objectives or cost functions in 
quantitative models. Structural engineering design and testing often require an it- 
eration process with parameter adjustments. Optimization functions can generally 
be formulated as: 



Subject to: 



Optimize: f (X), (12.1) 

gi(X)>0,i=l,2,...,N. (12.2) 

hj(X) = 0,j = l,2, ...,M. (12.3) 

where X = (xj, x 2 , . . . , x n ), XvQ (parameter space). 

Most design optimization problems in structural engineering involve many dif- 
ferent design variables under complex constraints. These constraints can be writ- 
ten either as simple bounds such as the ranges of material properties, or as non- 
linear relationships including maximum stress, maximum deflection, minimum 
load capacity, and geometrical configuration. Such non-linearity often results in 
multimodal response landscape. 
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The basic requirement for an efficient structural design is that the response of 
the structure be acceptable for given various specifications. That is, a set of 
parameters should at least be in a feasible design. There can be a very large 
number of feasible designs, but it is desirable to choose the best of these designs. 
The best design can be identified using minimum cost, minimum weight, maxi- 
mum performance or, a combination of these [1]. Obviously, parameters may have 
associated uncertainties, and in this case, a robust design solution, not necessarily 
the best solution, is often the best choice in practice. As parameter variations are 
usually very large, systematic adaptive searching or optimization procedures are 
required. In the past several decades, researchers have developed many optimiza- 
tion algorithms. Examples of conventional methods are hill climbing, gradient 
methods, random search, simulated annealing, and heuristic methods. Examples of 
evolutionary or biology-inspired algorithms are genetic algorithms [2], neural 
network [3], particle swarm optimization [4], firefly algorithm [5], cuckoo search 
[6], and many others. The methods used to solve a particular structural problem 
depend largely on the type and characteristics of the optimization problem itself. 
There is no universal method that works for all structural problems, and there is 
generally no guarantee to find the globally optimal solution in highly nonlinear 
global optimization problems. In general, we can emphasize on the best estimate 
or suboptimal solutions under given conditions. Knowledge about a particular 
problem is always helpful to make the appropriate choice of the best or most effi- 
cient methods for the optimization procedure. 

12.2 Classifications of Benchmarks 

Generally, an optimization problem is classified according to the nature of equa- 
tions with respect to design variables, the characteristics of the objectives and con- 
straints. If the objective function and the constraints involving the design variable 
are linear, then the optimization is called a linear optimization problem. If even 
one of them is non-linear, it is classified as a non-linear optimization problem [1]. 

Design variables can be continuous or discrete (integer on non-integer). In 
structural engineering, most problems are mixed variable problems, as they con- 
tain both continuous and discrete variables. The structural optimization of bar or 
truss sections often includes a special set of variables which are integer multiples 
of certain sizes and dimensions. 

According to the number of variables, constraints, and objective function(s), an 
optimization problem can be classified as small scale, normal scale, large scale 
and very large scale. 

Nearly all design optimization problems in structural engineering are highly 
non-linear, involving many different design variables under complex, nonlinear 
constraints. In this study, benchmark optimization problems are classified into 
two groups: truss and non-truss. First, we introduce truss design problems. Truss 
structures are designed to carry multiple loading conditions under static con- 
straints concerning nodal displacements, stresses in the members and critical 
buckling loads. This class of problems was chosen because truss structures are 
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widely used in structural engineering [7, 8]. Also, examples of truss structure 
design optimization are extensively used in the literature to compare the efficiency 
of optimization algorithms [9-12]. Then, we introduce five examples of non-truss 
optimization problems under static constraints. 

12.3 Design Benchmarks 
12.3.1 Truss Design Problems 

Truss optimization is a challenging area of structural optimization, and many re- 
searchers have tried to minimize the weight (or volume) of truss structures using 
different algorithms. For example, Maglaras et al. [13] compared probabilistic and 
deterministic optimization of trusses. They showed that probabilistic optimization 
provided a significant improvement. Hasancebi et al. [14] evaluated some well- 
known metaheuristic search techniques in the optimum design of real trusses and 
they found that simulated annealing and evolution strategies perform well in this 
area. 

For most truss optimization problems, the objective function can be expressed 
as 



minimize 



W(A) = Y j y i A,L, (12.4) 



where W(A) is the weight of the structure; NM is the number of member in the 
structure; yj represents the material density of member i; L, is the length of mem- 
ber i; A, is the cross-sectional area of member i chosen between A min and A max (the 
lower bound and upper bound, respectively). Any optimal design also has to sat- 
isfy some inequality constraints that limit design variable sizes and structural re- 
sponses [15]. 

The main issue in truss optimization is to deal with constraints because the 
weight of each truss structures can be simplified to an explicit formula [16]. Gen- 
erally, a truss structure has one of the following three kinds of constraints: 

Stress constraints: each member is under tensile or compressive strength so for 
each member of the structure, the positive tensile stress should be less than the al- 
lowable tensile stress (o max ), while the compressive stress should be less than the 
allowable compressive stress (o m ; n ). In each truss optimization problem, we have 
2NM stress constraints. These constraints can be formulated as follow: 

o i, min <a t <a h max ; i = 1, 2,..., NM (12.5) 

Deflection constraints: The nodal deflections (displacement at each node) should 
be limited within the maximun 
NM nodes, it can be defined as: 



be limited within the maximum deflection (5 max ). When a truss has NM 2 =NM x 



5 J <5 J , max j = l,2,...,NM 2 (12.6) 



Benchmark Problems in Structural Optimization 



263 



Buckling constraints: Buckling can be defined as the failure of a member due to a 
high compressive stress. In this case, the applied ultimate compressive stresses at 
the point of failure are higher than the bearing capacity of the member. When a 
member is in compression, the buckling status of the member is controlled accord- 
ing to the buckling stress (o b ). Let NC denote the number of compression 
elements, and we have 



o kib <o k <0;k= 1,2,..., NC 



(12.7) 



For truss optimization problems, there are only two constant mechanical proper- 
ties: elastic modulus (E), and material density (j). Structural analysis of each truss 
can readily be carried out using the finite element method. 

12.3.1.1 10-Bar Plane Truss 

This truss example is one of the most well-known structural optimization bench- 
marks [17]. It has been widely used by many researchers as a standard 2D bench- 
mark for truss optimization (e.g., [16-18]). The geometry and loading of a 10-bar 
truss is presented in Figure 12.1. This problem has many variations and has been 
solved with only continuous or discrete variables. The main objective is to find 
minimum weight of the truss by changing the areas of elements, so it has 10 
variables in total. 
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Fig. 12.1 10-bar truss structure 



12.3.1.2 25-Bar Transmission Truss 



This spatial truss structure has been solved by many researchers as a benchmark 
structural problem [19]. The topology and nodal numbers of a 25-bar spatial truss 
structure are shown in Figure 12.2 where 25 members are categorized into eight 
groups, so it has eight individual variables. This problem has been solved with 
various loading conditions (e.g. [10, 11, 20, 21]). 
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Fig. 12.2 A twenty five-bar spatial truss [22] 



12.3.1.3 72-Bar Truss 



The 72-bar truss is a challenging benchmark that has also been used by many re- 
searchers (e.g., [9, 18, 22, 23]). As shown in Figure 12.3, this truss has 16 inde- 
pendent groups of design variables. It is usually subjected to two different loading 
inputs. 
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Fig. 12.3 A 72-bar spatial truss [22] 



12.3.1.4 120-Bar Truss Dome 

The 120-bar truss dome is used as a benchmark problem in some researches (e.g., 
[10, 16]). This symmetrical space truss, shown in Figure 12.4, has a diameter of 
31.78 m, and its 120 members are divided into 7 groups, taking the symmetry of 
the structure into account. Because of symmetry, the design of one-fourth of the 
dome is sufficient. The truss is subjected to vertical loading at all the unsupported 
joints. According to the American institute of steel construction (AISC) code for 
allowable stress design (ASD) [24] standards, the allowable tensile stress (o max ) is 
equal to 0.6F y (F y is the yield stress of the steel) and the allowable compressive 
stress (o min ) is calculated according to the slenderness. 
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Fig. 12.4 A 120-bar dome shaped truss [10] 



12.3.1.5 200-Bar Plane Truss 



The benchmark 200-bar plane truss structure shown in figure 12.5 which has been 
solved in many papers with different number of variables. The 200 structural 
members of this planar truss has been categorized as 29 [11], 96 [25] or 105 [26] 
groups using symmetry in the literature. Some researchers have also solved it with 
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200 variables when each member is considered as an independent design variables 
[27]. This planar truss problem has been solved with three or five independent 
loading conditions [28]. 




Fig. 12.5 A 200 bar plane truss 



12.3.1.6 26-Story Truss Tower 

Figure 12.6 shows the geometry and the element groups of the recently developed 
26-story-tower space truss ([10, 11, 29-31]). This truss is a large-scale truss prob- 
lem containing 244 nodes and 942 elements. In this truss structure, 59 element 
groups employ the symmetry of the structure. This problem has been solved as a 
continuous problem and as a discrete problem [30]. More details of this problem 
can be found in [31]. 
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21.95 m 
(72 ft) 



29 26 m 
(96 ft) 



43 89 m 

(144 ft) 




Fig. 12.6 A 26-story-truss tower [10] 



12.3.1 Non-truss Design Problems 



12.3.1.1 Welded Beam 



The design of a welded beam which minimizes the overall cost of fabrication was 
introduced as a benchmark structural engineering problem by Rao [19]. 
Figure 12.7 shows a beam of low-carbon steel (C-1010), welded to a rigid support. 
The welded beam is fixed and designed to support a load (P). The thickness of the 
weld (h), the length of the welded joint (12.1), the width of the beam (t) and the 
thickness of the beam (b) are the design variables. The values of h and 1 can only 
take integer multiples of 0.0065, but many researchers consider them continuous 
variables [32]. The objective function of the problem is expressed as follows: 



Benchmark Problems in Structural Optimization 



269 




Fig. 12.7 Welded beam design problem 



Minimize: f(h, L, t, b) = {\ + C { )h 2 l + C 2 tb(L + 1) 



subject to the following five constraints: 
shear stress(x) 



(12.8) 



gl=T d -TZ0 




(12.9) 


bending stress in the beam (a) 






g 2 = (y d -(T>Q 




(12.10) 


buckling load on the bar (P c ) 






g 3 =b-h>0 




(12.11) 


deflection of the beam (5) 






g 4 =P c -P>0 




(12.12) 


side constraints 






g 5 =0.25-J>0 




(12.13) 


where 






t = ^{t') 2 +{t") 2 +ItY/^0.25(i 2 


+ {h + tf) 


(12.14) 


504000 




(12.15) 


P c =64746(1-0 


.02S2346t)tb 3 


(12.16) 


- 2.1952 




(12.17) 



fb 
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, 6000 
T f = - j =- (12.18) 



„ _ 6000 (14 + 0.5l)^0.25(l 2 + (h + tf) 
2{0.707 hl(l 2 111 + 0.25 (A + tf)} 



(12.19) 



The simple bounds of the problem are: 0.125 < h < 5, 0.1 < 1, t < 10 and 0.1 < b < 
5. The constant values for the formulation are given in Table 12.2. 

Table 12.1 Constant values in the welded beam problem 

Constant Item Description Values 

Ci cost per volume of the welded material 0.10471($/in3) 

C2 cost per volume of the bar stock 0.048 1 1 ($/in3) 

id design shear stress of the welded material 1 3600 (psi) 

rj d design normal stress of the bar material 30000 (psi) 

Sd design bar end deflection 0.25 (in) 

E Young' s modulus of bar stock 30x10' (psi) 

G shear modulus of bar stock 12x10' (psi) 

P loading condition 6000 (lb) 

L overhang length of the beam 14 (in) 

This problem has been solved by many researchers in the literature (e.g., [15, 
33, 34]) here are two different solutions presented. One has an optimal function 
value of around 2.38 and the other one (with a difference in one of the constraints) 
has an optimal function value of about 1.7. Deb and Goyal [35] extended this 
problem to choose one of the four types of materials of the beam and two types of 
welded joint configurations. 

12.3.1.2 Reinforced Concrete Beam 

The problem of designing a reinforced concrete beam has many variations and has 
been solved by various researchers with different kinds of constraints (e.g., [36, 37]). 
A simplified optimization problem minimizing the total cost of a reinforced concrete 
beam, shown in Figure 12.8, was presented by Amir and Hasegawa [38]. The beam is 
simply supported with a span of 30 ft and subjected to a live load of 2.0 klbf and a 
dead load of 1 .0 klbf including the weight of the beam. The concrete compressive 
strength (o c ) is 5 ksi, and the yield stress of the reinforcing steel (F y ) is 50 ksi. The 
cost of concrete is $0.02/in 2 /linear ft and the cost of steel is $ 1 .0/in 2 /linear ft. The aim 
of the design is to determine the area of the reinforcement (A s ), the width of the beam 
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(b) and the depth of the beam (h) such that the total cost of structure is minimized. 
Herein, the cross-sectional area of the reinforcing bar (A s ) is taken as a discrete type 
variable that must be chosen from the standard bar dimensions listed in [38]. The 
width of concrete beam (b) assumed to be an integer variable, and the depth (h) of the 
beam is a continuous variable. The effective depth is assumed to be 0.8h. 

3k Ibf 




A s 



30ft 



Fig. 12.8 Illustration of reinforced concrete beam 

Then, the optimization problem can be expressed as: 
Minimize: f(A s ,b,h) = 2.9A s + 0.6bh 



(12.20) 



The depth to width ratio of the beam is restricted to be less than, or equal, to 4, so 
the first constraint can be written as: 



g,=--4<0 
b 



(12.21) 



The structure should satisfy the American concrete institute (ACI) building code 
318-77 [39] with a bending strength: 



M, 



0.9 A S F y (0.8& 1 1.0-0.59 



A„F„ 



0Mha„ 



>\AM d +\.lM i (12.22) 



where M u , M d and Mi are, respectively, the flexural strength, dead load and live 
load moments of the beam. In this case, M d = 1350 in.kip and M; = 2700 in.kip. 
This constraint can be simplified as [40] : 



g 2 =180+7.375 



Ah < 



(12.23) 



The bounds of the variables are b e [28, 29, . . ., 40} inches, 5 < h < 10 inches, and 
A s is a discrete variable that must be chosen from possible reinforcing bars by 
ACI. The best solution obtained by the existing methods so far is 359.208 with 
h = 34, b = 8.5 and A s = 6.32 (15#6 or 1 1#7) using firefly algorithm [41]. 



12.3.1.3 Compression Spring 

The problem of spring design has many variations and has been solved by various 
researchers. Sandgren [42] minimized the volume of a coil compression spring 
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with mixed variables and Deb and Goyal [35] tried to minimize the weight of a 
Belleville spring. The most well-known spring problem is the design of a tension- 
compression spring for a minimum weight [43]. Figure 12.9 shows a tension- 
compression spring with three design variables: the wire diameter (d), the mean 
coil diameter (D), and the number of active coils (N). The weight of the spring is 
to be minimized, subject to constraints on the minimum deflection (gl), shear 
(g2), and surge frequency (g3), and to limits on the outside diameter (g4) [43]. 
The problem can be expressed as follows: 



Subject to: 



Minimize: f(N,D,d) = (N + 2)xDd : 



r DN ^n 

4D 2 -Dd 1 

g 2 = 1 — -z r\ + --1<0 

1256((Dd 3 -d") 510&/ 2 

14045,^ 



D + d 

1.5 



-1<0 



(12.24) 

(12.25) 
(12.26) 
(12.27) 
(12.28) 



where 0.05 < d < 1, 0.25 <D<\3and 2 < N < 15. 



N 



I) 




Fig. 12.9 Tension-compression spring 

Many researchers have tried to solve this problem (e.g., [33, 44, 45]) and it 
seems the best results obtained for this problem is equal to 0.0126652 with d = 
0.05169, D = 0.35673, N = 11.28846 using bat algorithm [46]. 

12.3.1.4 Pressure Vessel 

Pressure vessel is a closed container that holds gases or liquids at a pressure, typi- 
cally significantly higher than the ambient pressure. A cylindrical pressure vessel 
capped at both ends by hemispherical heads is presented in Figure 12.10. The 
pressure vessels are widely used for engineering purposes and this optimization 
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problem was proposed by Sandgren [42]. This compressed air tank has a working 
pressure of 3000 psi and a minimum volume of 750 ft 3 , and is designed according 
to the American society of mechanical engineers (ASME) boiler and pressure ves- 
sel code. The total cost, which includes a welding cost, a material cost, and a 
forming cost, is to be minimized. The variables are the thickness of shell (T s ), 
thickness of the head (T h ), the inner radius (R), and the length of the cylindrical 
section of the vessel (L). The thicknesses (T s and T h ) can only take integer multi- 
ples of 0.0625 inch. 




Fig. 12.10 Pressure Vessel 



Then, the optimization problem can be expressed as follows: 
Minimize: f(T s ,T h ,R,L) = 0.6224T s RL + U7SlT h R 2 + 3A661T 2 L + 19.S4T h 2 L (12.29) 

The constraints are defined in accordance with the ASME design codes where 
g 3 represents the constraint function of minimum volume of 750 feet and others 
are the geometrical constraints. The constraints are as follow: 



gl =-T s +0.0193R<0 



(12.30) 



-T,+0.0095R<0 



(12.31) 



-7iR 1 L--7jR i +750xll728<0 



L - 240 < 



(12.32) 
(12.33) 



where 1x0.0625 < T s , T h < 99x0.0625, 10 < R, and L < 200. The minimum cost 
and the statistical values of the best solution obtained in about forty different stud- 
ies are reported in [47]. According to this paper, the best results are a total cost of 
$6059.714. Although nearly all researchers use 200 as the upper limit of variable 
L, it was extended to 240 in a few studies (e.g., [41]) in order to investigate the 
last constrained problem region. Use this bound, the best result was decreased to 
about $5850. It seems this variation may be a new challenging benchmarking 
problem. It should also be noted that if an approximate value for % is used in the g 3 
constraint calculation, then the best result cannot be achieved (actually a smaller 
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value will be obtain). Thus, the exact value of n should be used in this problem. 
From the implementation point of view, a more accurate approximation of n 
should be used. 

12.3.1.5 Speed Reducer 

A speed reducer is part of the gear box of mechanical system, and it also is used 
for many other types of applications. The design of a speed reducer is a more chal- 
lenging benchmark [48], because it involves seven design variables As shown in 
Figure 12.11, these variables are the face width (b), the module of the teeth (m), 
the number of teeth on pinion (z), the length of the first shaft between bearings 
(li), the length of the second shaft between bearings (1 2 ), the diameter of the first 
shaft (d t ), and the diameter of the second shaft (d 2 ). 



r=,p= 



^=,p= 



___H-_ 



^Jk 



_ 1 



Fig. 12.11 Speed Reducer 



The objective is to minimize the total weight of the speed reducer. There are 
nine constraints, including the limits on the bending stress of the gear teeth, sur- 
face stress, transverse deflections of shafts 1 and 2 due to transmitted force, and 
stresses in shafts 1 and 2. 

The mathematical formulation can be summarized as follows: 

Minimize ./(fe,m,z,Z 1 ,/ 2 ,d 1 ,d 2 )=0.7854fom 2 (3.3333z 2 +14.9334z-43.0934) n 2 .34) 
-1.508b(d 2 + d 2 2 )+7A7l(d' +d 2 ')+0JS54(l l d l 2 +l 2 d 2 2 ) 



Subject to: 



27 



'1 ; 2 

bm z 



P-1<0 



397.5 



bm 2 z 2 

1.93 

mzU d. 



-1<0 



1<0 



(12.35) 
(12.36) 
(12.37) 
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1.93 



mzl-, d. 



-1<0 (12.38) 



/r745 O 2 +1 . 69xl0 * 



,V mz ' 1<0 (12.39) 



110d, 3 



745/ 

1 i +157.5x10' 



" /:: ' 1<0 (12-40) 



85 d 2 * 

— -1<0 (12.41) 

40 

5m 
B-\ 

b 
12m 



-1<0 (12.42) 

■ 1 < (12.43) 



In addition, the design variables are also subject to the simple bounds listed in col- 
umn 2 of Table 12.2. This problem has been solved by many researchers (e.g., [49, 
50]) and it seems the best weight of the speed reducer is about 3000 (kg) [47, 51]. 
The corresponding values of this solution so far are presented in Table 12.2. 

Table 12.2 Variables of the speed reducer design example 



Simple Bounds Variables of the best solution 



b 


[2.6- 


-3.6] 


3.50000 


m 


[0.7- 


-0.8] 


0.70000 


z 


[17- 


■28] 


17.0000 


h 


[7.3- 


-8.3] 


7.30001 


h 


[7.3- 


-8.3] 


7.71532 


di 


[2.9- 


-3.9] 


3.35021 


d 2 


[5.0- 


-5.5] 


5.28665 



12.3.1.6 Stepped Cantilever Beam 

This problem is a good benchmark to verify the capability of optimization meth- 
ods for solving continuous, discrete, and/or mixed variable structural design prob- 
lems. This benchmark was originally presented by Thanedar and Vanderplaats 
[52] with ten variables, and it has been solved with continuous, discrete and mixed 
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variables in different cases in the literature [8, 53]. Figure 12.12 illustrates a 
five-stepped cantilever beam with a rectangular shape. In this problem, the height 
and width of the beam in all five steps of the cantilever beam are the design vari- 
ables, and the volume of the beam is to be minimized. The objective function is 
formulated as follows: 

Minimize: V = D^h^ + b 2 h 2 l 2 + fe,/^ + b 4 h 4 l 4 + b 5 h 5 l 5 ) (12.44) 



/ 
/ 
/ 
/ 
/ 



/, -i- /_, 



(* -+- I 



I- 



Fig. 12.12 A stepped cantilever beam 



Subject to the following constraints: 

The bending stress constraint of each of the five steps of the beam are to be less 
than the design stress (o d ): 






b 4 h 4 



8s = 5i^)-,„ £ o 

6P{l s +l 4 +l 3 +l 2 ) 



84 



b 2 hl 



<T d <0 



6P(L +L+L+L +1.) 

li 2 i f LL- (J , < I 



w 



(12.45) 
(12.46) 
(12.47) 
(12.48) 
(12.49) 



One displacement constraint on the tip deflection is to be less than the 
allowable deflection (A max ): 



PI 



r 



3E 



1 7 19 37 61 . 4 

+ + + + - A max ^° 

Vis h h h h ' 



(12.50) 
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• A specific aspect ratio of 20 has to be maintained between the height and width 
of each of the five cross sections of the beam: 



(12.51) 



Si = 


b 5 


-20<0 


£8 = 


_ h 4 


-20<0 


§9' 


_h 3 
b 3 


-20<0 


8 10 


_h 2 
b 2 


-20<0 


8u 


_ h x 


-20 < 



(12.52) 



(12.53) 



(12.54) 



(12.55) 

The initial design space for the cases with continuous, discrete and, mixed variable 
formulations can be found in Thanedar and Vanderplaats [52]. 

This problem can be used as a large-scale optimization problem if the number 
of segments of the beam is increased. When the beam has N segments, it has 
2N+1 constrains including N stress constraints, N aspect ratio constraints and a 
displacement constraint. Vanderplaats [54] solved this problem as a very large 
structural optimization up to 25,000 segments and 50,000 variables. 

12.3.1.7 Frame Structures 

Frame design is one of the popular structural optimization benchmarks. Many re- 
searchers have attempted to solve frame structures as a real-world, discrete- 
variable problem, using different methods (e.g., [55, 56]). The design variables of 
frame structures are cross sections of beams and columns which have to be chosen 
from standardized cross sections. Recently, Hasangebi et al. [57] compared seven 
well-known structural design algorithms for weight minimization of some steel 
frames, including ant colony optimization, evolution strategies, harmony search 
method, simulated annealing, particle swarm optimizer, tabu search and genetic 
algorithms. Among these algorithms, they showed that simulated annealing and 
evolution strategies performed best for frame optimization. 

One of the well-known frame structures was introduced by Khot et al. [58]. 
This problem has been solved by many researchers (e.g., [59, 60]), and now can 
be considered as a frame-structure benchmark. The frame has one bay, eight sto- 
ries, and applied loads (see Figure 12.13). This problem has eight element groups. 
The values of the cross section groups are chosen from all 267 W-shapes of AISC. 
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3.4 m 

< ► 



12.529 kN ► 



8.743 kN ► 



7.264 kN ► 



6.054 kN ► 



4.839 kN ► 



3.630 kN ► 



2.420 kN ► 



1.210 kN ► 



Fig. 12.13 The benchmark frame 



12.4 Discussions and Further Research 



A dozen benchmark problems in structural optimization are briefly introduced 
here, and these benchmarks are widely used in the literature. Our intention is to in- 
troduce each of these benchmarks briefly so that readers are aware of these prob- 
lems and thus can refer to the cited literature for more details. The detailed de- 
scription of each problem can be lengthy, here we only highlight the essence of 
the problems and provide enough references. 
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There are many other benchmark problem sets in engineering optimization, and 
there is no agreed upon guideline for their use. Interested readers can found more in- 
formation about additional benchmarks in recent books and review articles [61, 62]. 
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